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CAROL LOEB MIR. A Comparison of String Handling in Pour 
Programming Languages. (Under the direction of PETER 
CALIHGAEHT.) 

The thesis compares character string handling in the 
programming languages SHOBOL4, TRAC, APL, and PL/I. The 
first tvo languages are representatives ef string processing 
languages, while the latter tvo represent general purpose 
programming languages* A description of each language is 
given. Also included are examples of string handling 
problems coded in the four languages. The languages are 
compared on the basis of their string handling abilities and 
not on the basis of implement at ion- dependent 

characteristics. 
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1 INTRODUCTION 



The purpose of this thesis is to compare character 

st ring handling in different prog ram mi ng languages. Of 

particular concern ace string operations in text handling. 

Sammet mentions £ 1 9, p. 385 ] 

The text material can be either natural language 
of some kind (e.g., this sentence) f a string 
composed of a program in any language, or any 
arbitrary seguence of characters from soma partic- 
ular data area. 

This thesis considers only natural language text material. 
Of course, this could be generalized to other special uses 
of string handling. 

String processing and list processing languages are 
examples of symbol manipulation languages. The data which 
they manipulate are symbols, not numbers. Symbol manipula- 
tion languages are used in such areas as compiler writing, 
theorem proving, formula manipulat ion r and text processing. 

Many accounts treat strings and lists together, but it 
is important to differentiate between them. A string is a 
seguence of characters; it is a data type in nany program- 
ming languages. A list, on the other hand, is a structure 
of data, which may or may not be characters. Sammet [19, 

1 



p. 385] distinguishes between a string and a list by noting 
that the list is a way of storing information rather than a 
t^pe of information to be represented. 

Strincj handling operations include concatenation of two 
strings, searching for a pattern, and replacing one pattern 
with another. Examples of list processing operations are 
putting information into a list, deleting information from a 
list, and combining two lists. 

Since only string operations are of concern in this 

thesis, the following symbol manipulation languages are 

excluded from consideration: [see reference 17] 

list processors, such as LISP1.5 and IPL-V; 
linked block lauguages, such as L 6 ; 
pat tern- directed structure p rocessors, like 
CONVERT and FLIP. 

The last group of languages perform string- like operations, 

but they operate on LISP list structures, not character 

strings. 

Text editors like TEXT360 are useful for publishing 
documents. These editors include comrands for line and 
document updating, which are string handling tasks. For 
example, inserting a phrase in the middle of a sentence is 
essentially a pattern matching task. However, their com- 
mands do not give an insight into how string problems are 
dealt with, so text editors are not included in the thesis. 
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The thesis compares string handling in two kinds of 
languages. These are string processing languages and gener- 
al purpose programming languages with built-in string hand- 
ling capabilities. String processing languages can be 
classified as pattern-directed string processors and macro- 
expander string processors. 

Included in pattern-directed string processors are all 
versions of the PANOM, COMIT, and SNOBOL languages. These 
languages use the generalized Markov algorithm as a way of 
defining string processing operations. The Harkov algorithm 
consists of a series of transfor nation rules. The languages 
perform substitutions on a string depending on the structure 
of the string according to the transformation rules. (For 
more information on the subject see [5].) 

These languages, ±n particular PANON, may be used 
effectively to write the syntax analysis phase of compilers. 
In such cases a program is regarded as a long string to be 
analyzed. PANON is not considered in the thesis since it is 
more like a syntax-driven coapiler than a string processor 
[3]« SNOB0L4, which includes raany of COKXT's features, is 
discussed in detail. & main factor for using SN0B0L4 for 
comparison was the availability of an implementation. Also, 
COMIT lacks some desirable language features, such as the 
ability to name strings, and facilities for easy arithimetic 
operations. 



9 

ERLC 



3 



Two lang uages which are in t he category of aacro- 
expander string processors are GPFl and TR AC. To perform any 
operation in these languages ( input/output , arithmetic, 
assignment, etc.), a lacro must be called with the necessary 
parameters. Since the TR AC language is so different from 
other programming languages and does include several string 
handling functions, it has been included. 

PL/I, unlike most other general purpose programming 
languages, provides good string handling capabilities and is 
included in the discussion. A PL, also considered, is an 
example of a general purpose programming language that 
provides for character data but does not have good string 
ha ndling functions * 

The four languages included in the thesis, then, are 
SNOBOLU, TRAC, PL/I, and APL. A brief summary of each 
language is in Chapter 2. 

In Chapter 3 tvo easy string problems are coded in each 
language. Also included in the chapter is a rather diffi- 
cult string handling problem coded in SN0B0L4, PL/I, and 
APL. 

Chapter 4 includes comparisons of the languages on the 
bases of what string operations are primitive in each 
language, and of ways string operations that are not 
primitive in a language might be coded in that language. 
The possible string handling problems for which the lan- 
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guages are suited or not suited are discussed. 

All comparisons of the languages in the thesis are made 
on the basis of language features. Implementation-dependent 
considerations , such as compilation time, execution speed, 
and amount of s tor a ge used, ha ve not been considered • A 
good com pari son based on these lat ter criteria would ha ve 
been extremely difficult for the following reasons. PL/I, 
TR AC, and SNOBOLU programs were batch processed, but APL 
programs used an interactive time sharing system. TR AC, 
SNOBOLU, and A PL were execute int erpretively, but PL/I was 
compiled into an object deck for later execution. Thus, 
these differences would tend to hide results that might be 
evident from a comparison of more sxx.ilar implementations. 

The languages are examined on the basis of the string 
operations which are primitive in them, not string opera- 
tions that can be added with a subroutine capability. A 
good programmer can code any string handling operation that 
he needs, but this should not figure in a language compari- 
son, unless the language had no facilities for defining new 
string functions. 

SMOBOLU programs were run interpretively on an IBM 
370/165 in batch mode. TR AC programs were run interpretive- 
ly on an IBM 360/75 in batch mode. PL/I programs were run 
on an IBM 360/75 using the IBM PL/I F compiler. APL 
programs were run interpretively on an IBM 370/165 in a time 
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sharing environment. 



2 LANGUAGE DE SCRI PTIONS 



In this chapter a brief summary of each language is 
given. The language features discussed include data types, 
statement types, and functions. 

2. 1 SN0B0L4 

SNOBOL is a string processing language which originated 
at Bell Laboratories in 1964; SH0BOL4 is the latest refine- 
ment. Its authors are D.J. Farber, ft. E. Grisvold, and 

1. P. Polonsky. Many of SN0B0L4's features, including its 
basic statement format, are influenced by COSIT [13], an 
earlier string handling language.. References for the SNOBOL 
language are [8], [9], and [10]. 

2. 1 . 1 Da ta Types 

There are several different data types, the most 
important one being the string. Strings can be broken up 
into components, operated upon, and then put together a Tain. 
Unlike vhat is done in CONIT, an earlier string manipulating 
language, strings may be assigned names. It is also 
possible to assign names to matched and partially matched 
substrings by the respective operations of conditional and 
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immediate value assignment* An example of a string in its 
literal form is •! AM A STRING 1 • One may write 

X - • I AH A STRING 1 
where X is a variable that is assigned the string value 
•I An A STBING X is considered to be of type string. 

A string wast often be searched for a pattern. In 
SNOBOLU a pattern is a structure that can be a string, a 
number of strings }ox»«u by the concatenation operator (a 
blank) , a number of strings separated by the alternation 
operator (a J with at least one blank on each side of it), 
or possibly a combination of all three. The alternation 
operator allows matching of alternate patterns. Patterns 
may be combinations of both literal strings and variables 
whose values are strings or patterns. Examples are the 
pattern 

• BIT* 1 HE R f | f 0R» 
(whose first alternate is eguivalent to •EITHER* ) , and the 
pattern 

•B« VAR1 | f B« | VAR2 
(whose first alternate is a literal concatenated with a 
variable) • The statement 

IT = 'ONE * | «T»0» 
assigns to IT a pattern that matches either the string • ON E 1 
or the string • TWO 1 . If Y * • ONE • , then the pattern 
Y | •TWO 1 is an eguivalent pattern to the previous value of 
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variable IT • 

There are also the arithmetic data types INTEGER and 
REAL, type ARRAY, and programmer-defined data types. 
Declarations of the data types of variables are .ot present 
in SNOBom. Instead, the type of a variable is dependent on 
the variable's last assigned value. 

2.1.2 Statements 

There are tour different statement types: assignment, 
pattern matching (without replacement) , replacement, and 
END. Actually all four statements follow a basic statement 
format consisting of five different fields, some of which 
may be absent in a particular statement. This format is: 

label subject pat tern - object g o- to 
Fields must be separated by at least one blank. If the 
label field is present, it must begin in Column 1. A 
statement not having a label must start in other than Column 
1. There are no other specifications for the beginning of 
any of the other statement components. However, no charac- 
ters may appear after Column 71. Continuation cards may b« 
used, so fields may be as long as desired. No maximum 
length of any field is specified. Labels must begin with a 
letter or digit and extend to the first blank. The subject 
or object may be either a literal string or the name of a 
string. The pattern field may be any of the possibilities 
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described previously for a pattern. The go-to field is used 
to indicate co ndit iona 1 and uncond itional branching. In the 
statement 

START X = f ABC* : (NEXT) 

the go-to field causes the statement whose label is NEXT to 
be branched to after X is assigned • A BC 1 . Branching 
conditionally upon success or failure of a statement is done 
with a :S(label) or ;F(label), respectively, in the go-to 
field. (Success or failure of a statement will be explained 
shortly. ) 

The assignment statement has already been illustrated 
in previous examples. Its format is 
label subject - obj ect <jo- to 

label and go -to are optional. The value of the object is 
assigned to the subject. 

The pattern matching and replacement statements are a 
little more involved. The pattern matching staterient f s 
format is: 

la bel subject fiattern go- to 

Il^Si and ao^to are optional. The entire subject is 
searched for an occurrence of the first alternate of the 
pa t tern ; if it is not found, then the subject is searched 
fcr the second alternate, etc. The statement is said to 
succeed if the pattern is located in the subject; it fails 
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ot her vise. Fo r ex aia pie, consider 

STR * •CABABET 1 
FIRST STR 'AD' | • AB • 

Statement FIRST s: cceeds, matching pattern ' AB • with the 
first AB in the subject. A pattern matching statement with 
1 A D' in place of 1 AD • J 1 AB 1 in the pattern field would 
fail. 

The result of a replacement statement is to substitute 
an object for the first occurrence of the matched pattern 
alternate in the subject. The basic format of a replacement 
statement is 

label subject £§ttern = object qo^tp 

label and go-t o are optional. To replace the first B with 

an R in statement FIRST, one would write: 

STR = • CABAB ET • 
FIRST STR , B t = • R« 

STR now has the value • CARABET' • Suppose that it was 

desired to replace the second B rather than the first B with 

an R. Then it would be necessary to write: 

STR = • CABABET 1 
FIRST STR "BE' = • RE • 

An END statement is simply END in the label field and 
signifies the end of a SN0B0L4 program. 

The four kinds of statements and input/output are 

illustrated in the following short program whose purpose is 

to count the number of E's and I*s in some input cards. 
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START 



X = IUPUT 
SUM = 0 



: F (END) 



LOOP 



OUT 



X ■!• 1 9 E' = 

SUM « SUM f 1 
OUTPUT = SUM 



: F(OUT) 
: (LOOP) 
: (START) 



END 

Input cards: 

HE RECEIVED A GIFT. 
A BEE STUNG THE BOY. 
OUR PROGRAMS H&D FAULTS, 

Output lines: 



Execution of the statement labelled START cause.- one input 
card to be read and assigns x the value of the card. The 
go-to field : F (END) means that on failure (there are no more 
input cards) he program is finished. Otherwise the normal 
sequential order of the program is followed, i.e- go to the 
second statement. The second statement initializes SUM to 
0. In the third statement x is searched for the first 
occurrence of the letter < I I . If no I«s are found, then 4 E* 
is to be looked for. The lack of an object after the = sign 
means that the or f E # is to be delated from X, If an I 

or E is found, one is added to SUM, and X is searched again. 
When an I or E can no longer be found, the program branches 
to the statement labelled OUT, which causes the printing of 
a line with t he value of SUM. The program then branches to 
START. The process continues until no more cards are in the 
input file, whereupon vhe program terminates. Notice that 
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after all the l»s and E's are found, X is the value of the 
input card with all I«s and E's removed. For example, the 
final value of X is «H HCVD A GFT 1 for the first input card. 

The following program segment finds the first E or I in 
X; if either letter is found, it indicates which of the two 
it was. This is easily done using conditional value 
operation. 

X = 'RELIVE 1 

X {■!■ | • E • ) . FIND = :F(OUT) 
OUTPUT = FIND 

The conditional value operator is a period (-), separated on 
both sides by at least one blank. In the above example, 
conditional value assignment associates a variable, FIND, 
rith a pattern ("I 1 \ • E 1 ) , such that when pattern alternate 
■I« matches the I in • RELIVE 1 , FIND is assigned 

2. 1.3 Arith metic 

Arithmetic facilities are limited in SHJBOL4. Addi- 
tion, subtraction, multiplication, division, and exponentia- 
tion of integer and real numbers may be done. Version 3 of 
SNOBOLU permits mixed-mode expressions and real exponents. 

2. 1 . U Function s 

There are several built-in or primitive functiors in 
^NOBOLU. For example, SIZS(X) returns the number of charac- 
ters in string X and TRIM(X) removes all trailing blanks in 

erJc 13 



string X. REPLACE (X, Y 1, Y2) replaces all occurrences of Y1 
in string X by Y2. Several primitive functions are useful 
for pattern matching. LEN(X), where X is an integer, has a 
value of a pattern that Batches any string of length X, The 
statement 

•RELIVE' LEN (1) . A 
results in A being assigned the value * R* • SPAN (X) and 
BREAK(X), where X is a string of characters, will match runs 
of the characters of X in the subject. TAB (integer) and 
RT AB (integer) allow matching attempts to be started at a 
desired position in the subject. ARB (no argument) matches 
an arbitrary number of characters in the subject. For 
instance, in the pattern matching statement 

•THE PICTURE ON THE WALL 9 "PICTURE* ARB 'WALL' 
ARB matches • ON THE •. There is also a cursor position 
operator d to assign the position in the subject where a 
match occurred. After execution of the following statement 
PTR will be assigned the value the position just before 
• PICTURE 1 . 

•THE PICTURE ON THE HALL 1 dPTH • PICTURE 1 ARB • HALL 1 
A second type of function in SNOBOLU is the predicate. 
If the condition specified by the predicate is satisfied, 
the predicate is replaced by the null string. If the 
condition is not satisfied, the statement fails and no 
operation is performed. The statement 
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I = LE(I, 9) I + 1 ; F ( END) 

will succeed, adding 1 to I, so Long as I is less than or 
equal to 9. The numeric predicates include LT , LE, GT, GE, 
EQ, and NE, whose meanings are what one would expect, 
INTEGER (X) determines whether X is an integer. Other 
predicates compare two strings instead of two numbers. For 
example, LGT {X r Y) succeeds if string X follows string Y in 
lexical ordering. 

The third type of function is a function defined by the 
user. These functions may be redefined during program 
execution. No special not at ion is requ ired for recur si ve 
function calls. 

2± 1^5 Qt her^Fe at u r es 

Other features of the language include data type 
conversion, indirect referencing, delayed evaluation of 
expressions in patterns, and the possibility of changing the 
way the subject is scanned for a pattern. 

SNOBOLU programs are translated into Polish prefix 
object code, and then executed interpret ively. This helps 
explain the good trace facilities in the language. 

Some of the differences between SN0B0L4 and the earlier 
SNOBOL and SN0B0L3 include improvements to I/O and arithmet- 
ic capabilities. Also, the array data type was not present 
in SNOBOL. There was no alternation operator in the earlier 
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languages, so patterns had to be less intricate, A large 

number of the primitive functions which help in doing 

complicated pattern matching problems were not present in 
the earlier languages. 



2.2 TRAC 

TRAC is an entirely different kind of string handling 
language from SN0B0L4. It is a macrogenerator language 
designed to be interactive. Wegner [21] says that a macro 
definition may be viewed as a function definition f such 
that for every set of actual parameters a( 1 ],..., a[ n ] in the 
allowed domain, a value string f (a[ 1 ], . . . ,a[ n J) is deter- 
mined which consists of the string generated as a result of 
macro expansion. In macro assemblers the domain of actual 
parameters consists of any strings that result in well- 
formed lines of code, where the lines of code are the range 
of the function. However in TRAC the domain and range of 
arguments are to some extent arbitrary strings. 

The two people responsible for the development of TRAC 
are Calvin Wooers and Peter Deutsch. The TRAC system was 
designed for interactive text processing. Sources of the 
TRAC concepts came from COBIT, LISP, and Hcllroy^ macro 
assembly system [5]. TRAC was developed independently from 
Strachey's GPW [21], although the languages are very simi- 
lar. TRAC is discussed in [5], [H) f [15], [16], [20], and 
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2 ^ 2 ,g,l, r T R AC , Instruct ion s 

The basic ins traction format is: 

: <FC!l,p[1 ]#p[ 2 ],..., p[k]) ■ 
: ( indicates a call to FCN, where FCM is a two-letter TR AC 
primitive (or evaluates to a two-letter TBAC primitive). 
The arguments of FCN are p[ 1 ], p[ 2 ] , . . . ,p[ k ]; each p[i] is a 
string of characters. An activation symbol, usually the 
apostrophe, indicates the end of input and ca.es the 
processor to execute what was just entered. FCN is also 
referred to as a macro name. 

Instructions are executed interpret ively by consulting 
a table in memory for the name of the primitive and then 
transferring to a subroutine for e?ecuting the primitive. A 
new primitive is added to the language by adding it to the 
table* However no new primitive can be spec i lied within a 
TRAC program. It must be entered before execution. 

An instruction is executed by replacing the instruction 
with its value, which may be the null string. Instructions 
may cause side effects in the memory , I/O medium 9 or 
information which determines the mode of operation of the 
TRAC processor. 



2.2.2 TRAC Primitives 

TRAC primitives include, first of all , primitives which 
allow the language to be interactive. 

: (RS) • indicates a string ot characters is to bo read 
from the typewriter unt\± an end of string character is 
found, and that this instruction is to be replaced with what 
was just read. 

: (PS, string) ' prints the value of string. For example, 

: (PS„IT IS RAINING) • 
prints IT IS RAINING. After printing, the null string is 
left as the value of the instruction. 

Macro definition is accomplished with the define string 
primitive. : (DS, name, string) says to evaluate name, evalu- 
ate string, and define the value of string to have as its 
name the value of name. For instance, 

: (DS,A,; (RS)) • 

causes a string to be read, evaluated, and the result named 
A. 

Macros are called with the call primitive. :(CL,name) 
says to call the name to which the name expression evaluates 
and replace the instruction with the nane's value. Thus, 
the new string could be a new instruction. 

Parameters may be introduced in a defined string with 
the seqraent string primitive* : (SS, na me, p[ 1 1, p[ 2 1 , . • . , p[ K ]) 
says to evaluate name, evaluate the parameters p[ i ], and 

Er|c is 



call the named string and replace each instance in it of 
p[ i j by a paranteter marker for i. The strinq is stored 
back; in memory. For example, consider 
: (SS, A, RAIN) • 

If I has the value IT IS RAINING, then RAIN is replaced by a 
parameter marker. To ^ee this new form for A, the print 
form primitive may be used. The value of :(PF,A)* would he 
IT IS <1>ING. Parameters may be replaced with actual para- 
meters. : (CL, name ,a[ 1 ],a[ 2 . • ,a[ m ]) replaces all occur- 
rences of parameter markers with the corresponding actual 
parameters a[ 1 ],a[ 2 ], . . • ,a[ m ]. If the number of actual 
parameters is less than the number of parameter markers, 
i.e. o<k, then null strings replace the remaining parameter 
markers. If m>k, then p[ k+1 ], • • . ,p[ m ] are ignored. The 
instruction 

: (PS, : (CL,A,SNOW) ) • 
prints the value of :(CL,A,SNOW) which is IT IS SNOWING. 

2. 2. 3 Evaluation M odes 

TRAC has three different evaluation modes: active, 
neutral, and quote. 

The characters : < initia te the active mode. These 
symbols cause the interpreter to delay evaluation of the 
current function (if there is one) and evaluate all argu- 
ments following : ( until the matching right parenthesis is 
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found. For instance, in evaluating : (PS, : (CL, A, SNOW) ) 9 , 
execution of the print string function is delayed until 
: (CL,A,SNOW) is executed. The string produced as a result 
of evaluating the active function is evaluated again, unless 
it is the null string* 

The characters ( initiate the neutral mode. The 
difference betveen this and the active node is that after 
the characters betveen ::( and Batching ) are evaluated once 
and a resulting string produced r the resulting string is not 
rescanned. 

The quote mode, initiated by {, stops all evaluation of 
what is betveen the Batching parentheses. Examples 1.,2., 
and 3, belov shov the differences among the three modes. 

Assume these definitions are sade for x and Y: 

:(DS,X,BOOK) X has value BOOK 

: (DS, Y, (: (CL, X) ) ) Y has value :(CL,X) 

Then 

1. : (PS,: (CL, Y) ) prints BOOK 

2. : (PS,: : (CL,I) ) prints :(CL,X) 

3. : (PS, (: (CL,Y) ) ) prints : (CL,Y) 

Two stacks are necessary during evaluation, the active 
string stack and the neutral string stack. Every instruc- 
tion is copied to the top of the active string stack and 
then scanned. Since parameters may also call TRAC func- 
tions, a stack is needed in which to put intermediate 
results of parameter evaluation. Thus, the necessity arises 
for the neutral string stack. A flowchart of TRAC evalua- 
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tion (Figure 2.1) follows [15]. 

2.2.4 Ar ithm et ic P riai t jves 

TB AC has primitives to handle the usual arithmetic 
operations. For example, : ( AD, d 1 , d2) • returns the sum of d1 
and d2, which are strings representing numbers. 

2. 2 . 5 Deci sio n Primitiv es 

Two primitives EQ (equals) and G8 (greater) provide 
decision facilities. The value of : (EQ, x1 ,x2 r t, f ) is t if 
character string x1 is equal to character string x2, 
otherwise the value is f. Similarly, : (GR ,d 1 , d2, t , f ) is t 
if d1 is greater than d2. GB 1 s operands d1 and d2 must be 
strings representing numbers, not character strings. 

2.2«6 Charac ter P rimitives 

Each defined string (or form) has a form pointer 
associated with it. Initially the form pointer points to 
the first character of the string; it may be moved by four 
primitives: CC (call a character) , CN (call a number of 
characters), CS (call a segment), and IN (index). The value 
of the instruction 

: (CC, s,z) ■ 

is the character in S pointed to by S"s form pointer. As a 
side effect, the form pointer of S is moved ahead one 
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ACTIVE STRING HAS A SCAN POINTER 
CURRENT LOCATION IS IN NEUTRAL STRING 
IDLING PROCEDURE IS : (DS , : (RS ) ) 



IS 

iCTIVE STRING 
EMPTY 



EES 





NO 




i 


EXAMINE CHAR. 


UNDER SCAN 


PTR. 





DELETE NEUTRAL STRING (W.S.), INITIALIZE 

ITS POINTERS 
RELOAD A NEW COPY OF IDLING PROCEDURE 

INTO ACTIVE STRING U.S.) 
RESET SCAN POINTER TO BEGINNING OF 

IDLING PROCEDURE 






DELETE ( 

MOVE SCAN PTR. TO CHAR. FOLLOWING FIRST 

MATCHING ) 
DELETE ) 

ALL NON -DELETED CHARS. ARE PUT IN N.S. 




DELETE COMMA 

LOCATION FOLLOWING RIGHT HAND CHAR. 
AT END OF THE N.S. (CURRENT LOCATION) 
IS MARKED BY A PTR. TO INDICATE 
END OF AN ARGUMENT SUBSTRING AND 
BEGINNING OF A NEW ARGUMENT SUBSTRING 




Figure 2.1 TRAC Algorithm. 
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Figure 2.1 (cont.) 
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FIRST UN SCAN NED CHAR. IN A.S. 
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TG BEGINNING-OF-FCN . PTR. FOR FCN. 
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DELETE ARG. AND FCN. PTRS. BACK TO BEGINNING-OF-FCN. 

PTR. 

DO NOT RESET SCAN PTR. 





Figure 2.1 (cont.) 
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Figure 2.1 (cont.) 
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character. If , before :(CC,S,z) f is executed, the form 
pointer of s points beyond the last character of S , the 
value of the instruction is z. Similarly, if the form 
pointer is beyond the last character of S, the value of 
:(CN,S,k,z)* is z. Otherwise the value returned is the next 
k characters of S after the form pointer. The form pointer 
is moved ahead (or back if k is negative) k places. 
:{CS,S r z)' gives the segment of characters from the current 
position of the form pointer to the next parameter marker. 
: ( IN,S ,.x,z ) 1 searches S for substring x. If the substring 
is present , the value that is returned is the string between 
the beginning position of the form pointer and the matched 
string; the form pointer is moved to the character after the 
matched string. If there is no match, 2; is returned. The 
cursor-reset or call-restore function :(CR,S) resets the 
form pointer of S to the first character in S. 

Some other functions useful in string processing are 
mentioned by van rter Poel in [20]. One is the yes there 
function, : (YT ,N,x r t , f ) . if string x is in N, then the 
value of the function is t, otherwise the value is f. 
:(LP,N) and :(RP?N) give, respectively, the number of 
characters to the left of the form pointer and to the right 
of the form pointer* Another function, IL (in left) 9 is 
like IN but searches to the left in x. : (LG, x1,x2, t, f ) 
determines whether string x1 is lexically greater than 
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string x2. If so, the value returned is t; if not, the 
value returned is f. 

A character primitive combined with EQ can move the 
form pointer ahead and return a null string as result. For 
example, 

: (EQ, : (CS, SENT) ,) • 
moves the pointer after a segment of SENT. There are no 
true-false exits, and a null string is returned. 

2.3 APL 

APL was originated by Kenneth Iverson. It was devel- 
oped further in association with A.D. Falkoff. Discussions 
of APL may be found in [11] and [12]. 

APL is a general purpose programming language whose 
concise notation is good for interactive use. APL is 
particularly useful in dealing with vectors and multidimen- 
sional arrays. The APL discussed in the thesis is the 
implementation used in an APL/360 interactive system. The 
implementation provides a good repertoire of system action 
commands; these will not be discussed. 

The double arrow (< — >) will be used in the following 
discussion to denote equivalence. This symbol is not part 
of APL but merely a notational convenience. 
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2._3. 1 Data Types 

As in SNOBOLU, there are no declarations of type of a 
variable. The only types are numbers and characters. 

A scalar may be a number or a character. An array is 
built from scalars of the same type. Thus, an array cannot 
contain both numbers and characters. 

A character string is a one-dimensional array of 
characters. Thus, any operation on the string is performed 
on each element individually. The importance of this 
feature is illustrated in Chapter 4. There is no conversion 
between characters and numbers. 

2*_3 .2 Statem ents 

The branch and the specification statements are the two 
basic statement types. Branch statements are used only in 
user-defined functions. Their explanation will be deferred 
until defined functions are disc ussed. Examples of specifi- 
cation statements are: 

#«-5f2 

Y+'I AM A STRING' 
Z+l 2 3 4 
2l«-3>'5+2 
Z2+-(3*5) + 2 

Specification statements assign to the variable on the left 
hand s\de of the arrow the result of evaluating the 
expression on the right hand side of the arrow. In the 
examples given previous /, X is assigned 2.5, Y is assigned 
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the character vector inside the a postrophes (each charact er 
in the vector is an element of vector Y) 9 and Z is assigned 
a vector wit h t he first four integers as elements. Two 
elements of a vector of numbers, not a vector of characters, 
are separated for input and output by at least one blank. 
The value of Z1 is 21, not 17 r because order of execution is 
right to left. However, Z2 does have value 17 since 
parentheses ate used. 



2. 3.3 Indexing Jrrajrs 



[i] written after a vector, or [i;j] written after an 
array (i or j possibly omitted), are called indices or index 
functions. Like subscripts in other languages, the indices 
are used to reference elements of vectors and arrays. For 
example, suppose 

B<r*I AM A STRING' 
2 

3 4 
5 6 

Then 

C[l;2>-*2 
C[1;V-KL 2 
C[;2]<-*2 4 6 
C\2 3;2l«->4 6 
CTl 2;1 2V+1 2 
3 H 

As illustrated above, t he indices (subscripts) inside t he 
brackets may be scalars, vectors, or arrays. 
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2. 3,4 Fu ncti on s 

There are t wo kinds of f unct ions (or operators) : 
primitive functions which are built into the system, and 
defined functions which are defined by the user. Primitive 
functions will be considered first. 

Every pr imit ive function is eithec monadic (one argu- 
ment) or dyadic (two arguments) • Whether an argument may be 
a scalar, a vector, or an array depends on the function 
used. The form of function result, i.e. scalar, vector, or 
array, depends on the type of arguments used. (A scalar is 
not considered to be a vector of length one.) 

Primitive functions are considered to be either scalar 
or mixed. Scalar functions are those which return a scalar 
result for scalar arguments. However, their arguments may 
be arrays, which are operated on element by element by the 
function. The shape of the result is the same as that of 
one of the arguments. For example, suppose 2 3 4 and 

S2«-5 6 7 8 . To evaluate S1+S2, the addition operator is 
applied to corresponding elements in the two vectors, 
yielding the result 6 8 10 12. If SI or S2 is a scalar, 
then the scalar is paired with every element of the vector 
in evaluating the function. If S3<-5 then RESVLT+S2+S2 or 
RESULT+S2+S2 assigns to RESULT the vector value 

10 11 12 13. 

Man y function syra bo Is a re used to represen t two di f - 
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ferent functions in APL. The meaning of the symbol depends 
on the nuiber of arguments it has. For example, ' is the 
ceiling (next integer not less than) function vhen used 
monadically (with one argument) and the Maximum function 
vhen used dyadically (with tiro arguments). For example, 
[3.$^ and 3r3.5<-3.5 

APL has relational operators which take scalar argu- 
ments and whose results are 1 if the relation holds for the 
arguments and 0 otherwise* For example, 

3;>t+<-K) ~2<3«-*l M f = f fl l +->0 
Scalar relation functions equals and not equals may be used 
with character arguments, but the other relations cannot* 
The logical functions or, and, etc* take logical arguments 
(O's and I's) and return 0 or 1 as value* For example, if 

4+1010 B+-1100 

then 

i4AS<-+1000 >4vB^1110 ~4«-K)101 

Any dyadic scalar function symbol may be followed by a 
reduction symbol /. This has the effect of applying the 
function symbol between successive components of the argu- 
ment* For example, +/X says to add together every component 
of vector X* Reduction may also be used along any coordi- 
nate of an array* 

Hixed functions may be defined on numbers or charac- 
ters* The shape of the result is not necessarily the shape 
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of one of the arguments* A mixed function tuust have a 
non-scalar either as an argument or as a result. An example 
of a non-scalar or mixed dyadic function is catenate, 
symbolized by a comma. This function says to concatenate 
its two arguments. For example, MB' , *CD'-*-+ % ABCD' . if x is 
assigned • AB f and Y is assigned •CD 1 , then Xj+-+*ABCD f also. 

Some of the more useful mixed functions will now be 
explained. These explanations may need to be referenced 
when reading later chapters* 

2. 3^4 » 1 Index . gener ator 

If N>0 # itf is a vector whose elements are the first H 

integers. For example, 

*5«-+l 2 3 4 5 

i0 is the null vector; it prints as a blank. 

2. 2_lndex of 

The dyadic use of iota, A\B , is very important in 
string handling problems. A\B gives the least index of the 
occurrence of each element of B in A, where A must be a 
vector. If an eleuent of B does not occur in A # then the 
function returns 1 plus the highest index of A* suppose 

B+*A S f 

A+'I AM A STRING' 

Then 

A\B++3 2 8 * 
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If X is a vector, then qX gives tho number of elements 

in X. If X is an array, pX gives the dimensions of X in the 

form of a vector result. For example, if X is array 

1 2 
3 H 
5 6 

then p*+-+3 2 , denoting three rows and two columns in X. 

2.3.U.U Reshape 

The dyadic function p can create an array. In such 
usage the first argument specifies the dimensions the array 
is to have. The second argument specifies a vector of 
elements to be in the array* The statement 

A+2 3pl 2 3 4 5 6 

defines A to be array 

12 3 
1 5 6 

2.3.U-5 Ravel 

The comma ( ,) used monadicall y rewrites an array as a 
vector. Hence, B+,A assigns 8 the value 12 3 4 5 6. 

2 3 . 4. 6 Membe r shi p 

The membership function e takes two arguments ; it 
yields a logical array that has the dimensions of the first 
argument. The result has ones in the positions where 
elements of the first argument are members of the second 
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argument, and zeroes in all other positions. For example, 

( i5)e2«-K) 10 0 0 

and 

' 'J AM A STRING' e'AEI0U'+-H 010010000100 
Parentheses are necessary around i5 in this example because 
of the right to left rule for function evaluation. 

2. 3. <f. 7 Compress a nd Expand 

Compress and expand operators used with two arguments 
are represented by the forward slash and the backward slash, 
respectively* A logical vector may be used to compress or 
expand a vector or array. In compressing character arrays, 
characters in the second argument are deleted at the 
positions where there are zeroes in the first argument* No 
changes are made in the positions in the second argument 
where there are ones in the first argument. In expanding 
character arrays, the result is the same as the second 
argument but with blanks inserted in positions vhere zeroes 
appear in the first argument. For example, suppose 
W00 1 and A+'ROAM* B+'RM % . Then I/A++'RM' and 

2. 3.5 Defined Functio ns 

Defined functions are used to extend the language. The 

ERLC 34 



following is an example of a function definition* 

VDIM 3 FUNCTION HEADER 

Cl] SUM+(pA)+pB J 

[2] AVER+SUMi2 \ FUNCTION BODY 

* 1 

The del <v) character before Din indicates the beginning of 
the function definition mode* The last del ends function 
definition* DIM is the name of the function to be defined; 
[i] stands for statement number i. The statements consti- 
tute the function body* After function definition the body 
is associated with function name DIM* DIM could be called 
by: 

A+<SIZE< 

B+'SIZE1\A 

DIM 

AVER 

Din calculates the average size of A and B. Since A 
contains four characters and B contains nine characters, the 
value 6*5 is printed* DIM can be rewritten to have two 
arguments* The function header would be changed to 

VA DIM B 

The function might then be called by 

Z«-'SIZ£" 

Z DIM 'SIZEV.Z 

Again 6*5 is the result* 

The basic format of a branch statement is ■+! • If I is 
a number or a label, the program branches to the correspond- 
ing statement in the function definition* If I is the null 
vector, the next instruction in statement number order is 
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executed. If 1=0 the execution is finished. 

Branch statements are used in the following function 

definition. 

V Z+DOUBLE STR 
[l] Z<-" 

[2] i0OP:Z+Z,2pl+S27? 
[3] STR+USTR 
[1*1 ->(0<p STR) /LOOP 
V 

The above function DOUBLE doubles every letter of STR. Z is 
assigned the null string in statement 1. In statement 2 Z 
is concatenated with tvo copies of If STR , the first 
character of STR. The first character of STR is dropped 
froiB STB in statement 3. Statement 4 causes a branch to the 
statement labelled LOOP if there is at least one more 
character in STB; otherwise the program stops. 

Suppose DOUBLE is used in a statement, for example, 

STRING+( DOUBLE 'ABC' ) .(DOUBLE •JT) t , Jr i 
Then STRING will have the value •AABBCCXXY'- 

The previous example illustrates that a defined func- 
tion does not have to be referred to any differently from a 
primitive function. This means that a defined function may 
also appear in other function definitions. 

Some defined functions are included in libraries avail- 
able to the user. Recursive function definitions are 
allowed. Also, APL/J60 allows functions to be traced as 
they are being executed and function definitions to be 
changed. 
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2. 4 PL/I 

PL/I is a general purpose programming language that can 
be used for a wide variety of problems. The original 
specifications for PL/I were written by the Advanced Lan- 
guage Development Committee of the SHARE FORTRAN Project, a 
group formed by SHARE and IBM. 

PL/I contains many of the features of COBOL, FORTRAN, 
and ALGOL. Also, to some extent, PL/I was influenced by 
APL. 

An important feature of PL/I is its nodularity. The 
language is such that a user need only learn that subset of 
PL/I applicable to his problems. 

PL/I is discussed in £2] and [18]. 

2. <4.1 Data Typ es 

Data fall into the categories of problem data and 
program control data. The latter category will not be 
discussed. Problem data may be divided into arithmetic data 
and string data. Attributes of a variable are declared in a 
DECLARE statement anywhere in the program. However, if any 
attribute is not declared explicitly, a default attribute is 
assigned. 

Attributes of arithmetic variables are BASE (binary or 
decimal) , SCALE (fixed or floating point), MODE (real or 
complex) , and precision. 



String data nay be either character or bit strings. 
All string operations and functious nay be performed on 
either kind. Strings may be declared to be of fixed or 
varying lengths. However, a maximum length must still be 
specified for a varying length string. 

Both arithmetic data and string data may be organized 
into arrays and structures. A structure may contain both 
arithmetic and string variables, whereas c 11 elements of an 
array must have identical attributes. 

2. Uj, 2 TO Block^5truct ure 

An important characteristic of PL/I is its block 
structure. Blocks are groups of statements that delimit the 
scope of variables. There are two kinds of blocks, proce- 
dures and BEGIN blocks. 

Procedures are subroutines which are activated expli- 
citly by being invoked. They may be passed parameters. 
BEGIN blockr are activated implicitly by being reached. No 
parameters are passed to BEGIN blocks. 

2. U. 3 Statement Ty pes 

PL/I has several different statement types. These 
include descriptive statements, such as DECLARE; I/O state- 
ments, such as GET and PUT; data movement and computational 
statements, such as assignment statements; program structure 
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statements, such as PROCEDURE, BEGIN ; and control state- 
ments, such as GO TO, IF, DO, CALL, RETURN. IF and GO TO 
statemeits provide, respectively, conditional and uncondi- 
tional aranching. IF statements can be quite complex • DO 
groups, delimited by DO and END statements, are used for 
control purposes; they can specify how many times and under 
what conditions a group of statements is to be executed. 
Some of the statements will be illustrated in the program 
following the PL/I discussion. 

2^ q^ji^St ring ^Capabilities 

Since PL/I has been influenced by FORTRAN, COBOL, and 
ALGOL, it is not usually considered a language in which to 
do string manipulation problems. However, there are several 
features of PL/I which permit fairly good string processing. 
In this respect PL/I differs from most general purpose 
programming languages. 

Rosin has discussed these useful string features in a 
1967 article [18]. Strings may be declared to be of fixed 
or varying length; fixed length is the default. String 
constants are delimited by apostrophes, e.g. 

•I Afl A STRING • . 

Strings may be concatenated using the operator |j. The 
function LENGTH (string) returns the size of string. The 
relation operator equals (=) may be used to compare two 
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strings. Also, all ot the other relational operators can be 
used on string operands. The result depends on the collat- 
ing sequence of the character codes. A replication factor 
may be placed before a character string constant, but not 
before the name of a character string. The factor, which is 
a constant, indicates how many times the character string 
constant is to be repeated. 

The two extremely useful built-in string functions are 
SUBSTfi and INDEX. ^UBSTB (string, i, n) gives the n character 
long substring of string that begins in position i. If n is 
absent, then the rest of string from character i on is 
given. SUBSTH may also appear on the left hand side of 
assignment sta tements as a pseu do ~ variable, thus allowing 
values to be assigned to substrings. For example, the 
statement 

SUBSTR(STR,3,9) = • ABCDEFGHI • ; 
replaces the third through eleventh positions of STR with 
the first nine letters of the alphabet. The INDEX function 
essentially does SNOBOL-like matching of a simple pattern. 
INDEX (string, substring) finds the left-most occurrence of 
substring in string. The position of the first character in 
the matched portion of string is returned, and 0 is returned 
if substring is not contained in string. This is a 
generaliz?ition of the iota operator of APL. 

There are other string functions as well, 
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REPEAT (string, N) does essentially the same thing as a 
replication factor • However, st ri ng is nor restricted to 
being a character string constant; it may be the name of a 
string. TRANSLATE (string, table 1, table2) :rausla tes each 
character in string which appears in tablel to the corre- 
sponding character in table2. In the following oxample 
tablel is •IG I and table2 is ' AD • • 

A - TRANSLATE (• SING« IG« ,• ><D ') ; 
assigns •SAND 1 to A. VERIFY (str.,ng1,string2) verities that 
every character of stringl is present iu string2. If so, 0 
is returned. If not, the position (index) of the first 
character in stringl not present in string2 is returned. 

A sample PL/I program follows that counts the number of 
I*s and E^s in an input card. 
PR: 

PROCEDURE OPTIONS (MAIN) ; 
DECLARE X CHAR (25) VARYING, 
SUM PIXED; 



START: 



GET LIST (X) ; 
SUM = 0; 

DO 1 = 1 TO LENGTH (X) ; 

IP SUBSTR (X, 1,1) I 1 J 

SOBSTR <X,I, 1) E« 
THEN SUH=SUM+1; 

END; 

PUT LIST (SUM) ; 
END PR; 
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3 SAMPLE STRING^HAN QLING ^PROBLEMS 



This chap ter contains two examples of easy st ring 
ha nd ling problems and one complex problem* These problems 
help show the different ways that basic string operations, 
which are discussed in detail in Chapter 4, are done in each 
language. Also, they use many of the language features 
discussed in Chapter 2. 

Problem 1 is sort ing N st rings into alphabetical order. 
Problem 2 involves listing all words that begin with a vowel 
that occur in a line of text. Problem 3 is a rather complex 
text matching problem. 

3. 1 PROBLEM , 

The following strings are to be sorted: 

CATCH 

THROW 

OUTFI ELD 

BASEBALL 

BASE 

CATCHER 

A bubble sort program will be written in all four 
languages. In the ticst stage the bottom two strings, the 
N- 1st and the Nth, are compared; the alphabetically earlier 
of the two strings is bubbled up and compared with the N-2nd 
string; the earlier of the two is bubbled up and compared 
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with the N-3rd, etc., until the proper string is at the top 
of the sequence of strings. In the second stage the above 
process is repeated; the top item is not checked. After the 
second stage the top two strings are in order. The bubble 
sort continues until a stage when no two s tri ngs a re 
interchanged. 

The flowchart , which applies to all four bubble sort 
programs, is given in Figure 3.1. The algorithm is a common 
form for a bubble sort and is found in reference [6]. It 
was relatively easy on the basis of programming time to 
write the PL/I, SN0B0L4, and A PL programs from the flow- 
chart. The TR AC program took more time to code* A bubble 
sort is not the best method for sorting in APL, so an 
alternate method is also given in the chapter. 

3. 1 . 1 SHQBO LU 

(Hefer to Figure 3.2.) 

The first input card contains N, the number of strings 
to be sorted. Succeeding cards contain the strings them- 
selves. A one-dimensional array A of N items is created by 
the statement 

A = ARRAY (N) 

Each string is a member of array A. Notice that the indices 
of array elements are denoted by O 1 s, not parentheses. 
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* THIS PROGRAM IS SIMILAR TO THE ONE IN THE SNOBOL IV MANUAL 

* INITIALIZE STAGE NO. 

I = 0 



* GET NUMBER OF ITEMS TO BE SORTED 
* 

N = TRIM (INPUT) 
A = ARRAY(N) 

* READ IN THE ITEMS 
* 

EE AD 1 = 1+1 

A<t> = TRIM (INPUT) 

+ 

* SORT THE LIST 
* 



1=0 
T = 1 
EQ (T,0) 
J = N 
T = 0 
1 = 1+1 
EQ (I, J) 

J = GT ( J,1) J - 1 
LGT(A<J>,A<J + 1>) 

TEMP = A<J> 
A<J> = A<J + 1> 
A<J + 1> = TEMP 
T = 1 



* PRINT SORTED LIST 
* 

PR (1=1 
PRINT OUTPUT = A<M> 
M = M + 1 

* 

END 



GO 

SORT 2 
SORT 1 



SWITCH 



: P (ERROR) 



:F (GO) S (READ) 



:S (PR) 

:S (SORT2) 
:F (SORT1) 

: (SORT 1) 



: F (END) 
: (PRINT) 



BASE 

EASEB ALL 
CATCH 
CATC tER 
OUTFIELD 
THROW 



Figure 3.2 SNOBOLA Program for Problem 1. 
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iiJ-X-PL/I 

(Refer to Figure 3, 3. ) 

The PL/I and 5N030L4 programs are very similar. Howev- 
er, in PL/I a maximum length for an element of A must be 
given (8 ia this example). In SN0B0L4 it is not necessary 
to specify maximum lengths of array elements. 

3. 1.3 A PL 

(Refer to Figure 3.4.) 

Since there is no collating sequence in APL, it is 
necessary to use a string S containing the letters of the 
alphabet in order preceded by a blank for reference in 
getting the proper lexical order. 

The sequence of six strings to be sorted is stored as a 
two-dimensional array A. J is the index of the array 
element in A being considered, L index es t he position or 
column of the array member, and I is the stage number of the 
bubble sort process. In the previous examples in SNOBOL4 
and PL/I, A was a vector (one-dimensional array of character 
strings), whereas in APL it is a two-dimensional array of 
characters . 

PL/I and SN0B0L4, when comparinq two strinqs of unequal 
lengths, left justify the shorter of the two strings ard pad 
to the right with blanks. However, in APL a string is a 
vector of characters. Since the dimensionality of two 
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SORT: PROCEDURE OPTIONS (MAIN) ; 

DECLARE (A (6) f TF.KFj CHARACTER (15) VARYING; 
/* READ IN NUMBER OF ITF.1S TO BE SORTED */ 

GET I 1ST (N) ; 
/* READ IN JiTKI NGS TO DE SORTED */ 

DO I~ 1 TO N ; 

gkt SKIP EDIT (A(I)) {A { l 1 }) , SK L PJ ; 
END; 

/* INITIALIZE VARIABLES */ 
T=1; 
1 = 0; 

SORT2: TP T=0 THEN GO TO PRINT; 
ELSE DO; 
T=0 ; 

1=1+1; 

J = N ; 

SORT1: IP J=I THEN GO TO SORT2 ; 
ELSE DO; 
J= J- 1 ; 

IF A (J) <= A(Jf1) THEN GO TO SORT 1 ; 

ELSE DO; /* INTERCHANGE ITErfS */ 

TEMP = A (J) ; 

A (J) "A (J + 1) ; 

T«1; /* INDICATE INTERCHANGE 

A (J f 1 ) =TEMP ; 
T= 1 ; 

GO TO SORT 1 ; 
END; 

END; 
END; 

PRINT: PUT EDIT (A) (SKIP, A (1 5) ) ; 
END SORT; 



BASE 

BASEBALL 
CATC H 
CATC hER 
OUTFIELD 
TUPr.y 



Figure 3.3 PL/I ?r >gram for Problem 1. 



7 SORT 

Cl] I«-0 

[2] S<-' ABCDEFGRIJKLMNOPQRSTUVWXYZ ' 

[ 3] iWfoPI 

[4] h-w^e 

[ 5] M 

[6] LOOP1'MT=0)/OUT 

[7] RERE:T+-Q 

[8] Jt-I+1 

[9] ^(pi4)Cll 

[10] TEST:AJ=D/L00P1 

[11] </<-J-l 

[12] I>0 

[13] L00P2 1 

[14] -K£=(p/4)[2]+l)/2!EOT 

[15] >((5i>l[«;;Zi])^'7i>l[J+l;L])/yff5 

[16] NO-.+TEST 

[17] y£'5:-^((5\/1[«;;L])=5T/l[J+l;L])/L00P2 

[la] ru^Ui] 

[19] yl[«7;>/l[«7+l;] 

[20] ALJ+U ]+2HSMP 

[21] !M 
[22] 

[23] 0i/T:-»O 
V 



C42CT- OUTFIELDBASEBALLBASE CATCHER 

□: 

6 8 



BASE 

BASEBALL 

CATCH 

CATCHER 

OUTFIELD 

THROW 



Figure 3.4 APL Program for Problem 1. 
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vectors oust match to be compared, and since APL strings are 
character vectors, two APL strings must be of the sane 
length to be compared. Therefore the programmer must 
provide for padding* 

The APL bubble sort program is similar to the SN0BOL4 
and PL/I versions. However, it is not the best way of 
writing a sort in APL* Since APL has such a wide variety of 
primitives, there are more concise ways to code the sort 
problem. One of these ways is found in Katzan [12]. His 
way uses the decode function x and transpose function § as 
well as the size and index functions. 

The expression RxX , where R is a radix and X is a 
vector of digits, denotes the value of X evaluated in a 
number system with radix R. For example, the value of 
10X1 2 3 is 123. Thus, if ABCDEFGHIJKLMNOPQRSTUVWXYZ 1 

the value of 21iS\*BC' would be (27 *x3) + (27<>x<0 or 85. 

The expression §x , where X is an array, returns the 
transpose of X. For example, if SMSTR+2 4p % THEY CAME* then 
^SiSMSTR is the array 

21 4 

9 2 

6 14 
26 6 

Now consider 

STRING** Qq'CATCH THROW OUTFIELDBASEBALLBASE CATCHER 1 
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Then the function 



VR+SORT STRING ;ALPH 
[1] ALPH+* ABCDEFGHIJKLMNOPQRSTUVWXYZ ' 
[2] R-*-STRING\i( pALPH) ±.§ALPH \STRING ; ] 

V 

will order the eleaents of STRING. 

STBIHG is the array: 

CATCH 

THROW 

OUTFIELD 

BASEBALL 

BASE 

CATCHER 

Tracing through the operation step by step: 
Step 1 

The value of 

ALPHx STRING 



4 


2 


21 


4 


9 


1 


1 


1 


21 


9 


19 


16 


24 


1 


1 


1 


16 


22 


21 


7 


10 


6 


13 


5 


3 


2 


20 


6 


3 


2 


13 


13 


3 


2 


20 


6 


1 


1 


1 


1 


4 


2 


21 


4 


9 


6 


19 


1 



Each row contains the indices of a row of STRING in ALPH. 
Step 2 

The value of 

tiALPH iSTRING 



9 

ERIC 
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is 



4 


21 


16 


3 


3 


4 


2 


9 


22 


2 


2 


2 


21 


19 


21 


20 


20 


21 


4 


16 


7 


6 


6 


4 


9 


24 


10 


3 


1 


9 


1 


1 


6 


2 


1 


6 


1 


1 


13 


13 


1 


19 


1 


1 


5 


13 


1 


1 



The function $ transposes the matrix obtained in Step 1. 
St*£-3 

The value of ( qALPH ) i$ALPH\STRING is 

4.29i98845i£io 

2.234358071£11 
1.761941507£11 
3.244612824£10 
3.244608781E10 
4.291988864^10 

The first number is equal to 

(27*x4) + (27*x2) + (27*x21) + (27* x4) f 
<27*x9> + <27*x1) + (27 *x1) + (27°x1) 

The other numbers are calculated in the sane way. 

Step 4 

The function I assigns ranks to the elements of 

(qALPH)i^LPHxSTRING • The value of . UqALPH)i^ALPB x STRING 
is 541632 * 
Stqp 5 

Finally indexing the rows, the value of 

STRING\ h ( pALPH ) i^ALPH \ STRING ; ] 
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is the sorted list 

BASE 

BASEBALL 

CATCH 

CATCHER 

OUTFIELD 

THROW 

3f.ijJL.UUC 

(Refer to Figure 3.5.) 

The TflAC version of the bubble sort provides many 
contrasts with the previous programs. For instance, there 
are no arrays in TRAC. However, there is a way to get 
around this deficiency. The variables that need to be 
array-like could be named 41, A2, A3, etc. Then : (A: (J) ) 
can be used to reference a[j]. 

In TRAC, as in APL, character strings that need to be 
compared must have the same length. Otherwise, when compar- 
ing two strings of unequal lengths, the shorter of the two 
will be right- justified and padded to the left with zeros. 
This contrasts with the lef t- j ustif i cation of character 
strings done in S N0B0L4 and PL/I* For alphabetization, 
therefore, the program must provide left justification* 

There is no equals operator that aay be used to compare 
two numbers. In S0RT1 of the TRAC program GR must be used 
twice to test for equality. 

The TRAC bubble sort is organized as a series of calls 
to MBIT, S0RT2 , NEW, S0RT1 , LO0P1, L00P2, and PRINT. 
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: (DS,N,6) • 
: (DS,T» 0) • 
: (DS, J, 1) • 

: ( DS, NE XT, (: (DS,T, 1) : (DS,I,0) : (SORT2) > ) ' 

: (DS,50RT2, (: (GR,: (T) ,0, (: (NEW)) , (: (DS,K,1): (PRINT))))) • 

: (DS,loop2, (: (DS,TEMP, : (A: (J) ) ) : (DS,A: (J) , : (A: (AD, : (J) , 1) )) : 

(DS, A: (AD,: (J) ,1),: (TEHP) ): (DS,T, 1) : (SORT1) ) ) ) ) ) • 

: ( ds, SORT1 , (: JGH,: (J) , : (I) , (: (L00P1) ) , ( : (GR , : (I) , : (J) , (: (LOO 

P1) ) , (: (SO HT2) ))))))' 

: (DS,L00P1 , (; (DS,J, : (SU, : (J) , 1) ) : (LG, : (A: (J) ) (A: (AD,: (J) , 1 
)) , (: (L00P2)) , (: (S0RT1) ) ) )) • 

: (DS, PRINT, (: (PS,: (A: (K) ) ) : (DS,K.: (AD.: (K) ,1) ) : (GR.K,N, # (: (P 
RINT ) ) ) ) ) ' 

: (DS,NEW, (: (DS.T,0) : (DS,I,: (AD,: (I) , 1) ) : (DS,J, : (N) ) : (S0RT1) ) 

) • 

: (DS, ASSIGN, (: (D5,I,0) : (ALOOP) ) ) • 

: (DS, A loop, (: (DS,I, : (AD,: (I) ,1)) : (GR,: (I),: (N) , {: (NEXT) ) , [: { 
os, A: (I) ,: (RS) ) :(AL00P) )))))))• 

: (ASSIGN) 'CATCH 'THROW ' OUTFIELD* EASEBALL* BASE 'CATCH 
ER • 

BASE BASEBALLCATCH CATCHER OUT FI ELDTHROW 



Figure 3.5^ TRAC Program for Problem 1. 



S0HT2 tests for T greater than 0, which indicates that 
more interchanges are necessary* If T is not greater than 
0, K is initialized to 1 and the prograi branches to pa INT. 
When T is greater than 0, NEW is called. 

NEW resets T to 0, increments I by 1, sets J equal to 
N, and calls SOBT1 . 

In SOBT1 the GR primitive is used to compare J with I. 
If J is greater than I, LOOP1 is called. Otherwise J and I 
must be compared again, using GR. If I is not greater than 
J, then I and J must be equal and the prograi branches to 
SORT2. 

L00P1 decrements J by 1. Next, a[j] and a[ are 
compared using the lexical ordering primitive LG • If a[ j] 
is lexically greater than a[j+1], L00P2 is called to 
interchange the two. Otherwise SO HT1 is called. 

L00P2 switches a( j ] and a[ j+1 ] and sets T to 1 to 
indicate that an interchange has taken place* SORT1 is 
called. 

PRINT is defined recursively. Each time PRINT is 
called it prints a string, increments K, and calls itself, 
flhen K exceeds N, the prograa stops. 

3. 2 PROBLEM 2 

(See Figures 3.6, 3.7, 3.8, and 3.9.) 
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LOOP 
CHECK 



TEXT - TRIH (INPUT) 
TEXT1 = TEXT 

TEXT BREAK ( * ♦) . MO BD LEN (1 ) 

IT = SIZE(WORD) - 1 

WORD ANY(' AEIOU » ) LEN (IT) 

OUTPUT = WORD 



: P ( END) 



:P (LOOP) 
: (LOOP) 



END 



I 

ALL 
A 



Figure 3.6 SNOBOU Program for Problem 2. 



VOWEL: PROCEDURE OPTIONS (MAIN) ; 

DECLARE WORD CHARACTER (15) VARYING, 



TEXT CHARACTER (80) VARYING, 
TEXT1 CHARACTER (80) VARYING, 



L CHARACTER (1) ; 

GET EDIT (TEXT) (A(80)); 

TEXT1 = TEXT ; 
LOOP: PT=INDEX (TEXT, • '); 

IF PT - 0 THEN GO TO PRINT1 ; 

WORD ■ SUBSTR (TEXT, 1,PT-1) ; 

TEXT = SUBSTR (TEXT, PT+1) ; 

L = SUBSTR (WORD, 1, 1) ; 

IF L= * A" | L= * E* | L= ' I * | 
L= , 0' | L=*U« 

THEN PUT EDIT (WORD) (A (15)); 

GO TO LOOP; 
PRINT 1 : END VOWEL; 



T ALL A 



Figure 3.7 PL/ 1 Program for Problem 2. 



V V0W2 

[I] TEXT*f\ 

[2] TEXT1+TEXT 

[3] TEXT*' \TEXT % * • 

[4] LIST*" 

[5] VEC+( TEXTe ' % )/\pTEXT 

T6] r+o 

[7] 7M7:-»-( (7«-I+l )=p^t7)/0 

[ 8 ] WORD+TEXTl VECtIl+ \ ( VFCt J+l ]- ( Wft7[J] + 1 ) ) ] 

[9] TEST-MW0RDltfc'AEIOU')/PR 

[10] 

[II] PRilIST*-LIST t % \WORD 
[12] -»JM7 

7 



row 

I WMW TO LIS? ALL THAT BECJff WITH A VOWEL 



LIST 
I ALL A 



Figure 3.8 APL Program for Problem 2. 



:(DS,TEXT,I WANT TO LIST ALL WORDS BEGINNING WITH A VOWEL) 1 

: (CS,TEXT1 ,: : (TEXT) ) • 

: (SS,TEXT, ) • 

: (DS, VOWEL, AEIOU) • 

: (DS , WORD, (: : (CS, TEXT) ) ) » 

: (DS,CHAR, (: : (CC,W) ) ) • 

: (DS, NEW WO ED, (: (G R, : (RP, TEXT) ,0, (: (DS,W, : (WORD)): (DS, LET, : (C 

HAR) ) : (COM FAR) )))))» 

: (DS,COMPAB, (: (EQ, : (LET) , :: (CC, VOWEL) , (: (PRINT) ) , (: (GR r : (RP, 

VOWEL) ,0, (: (COMPAR) ) , (: (TEST) ))))))« 

: (DS, PRINT, (: (CR,W) : (PS,: (W) ) : (TEST) )) • 

: (DS, TEST, (: (CR, VOWEL) : (NEWWORD) ) ) ) ) » 

: (NEWWORD) • 

I ALL A 



Figure 3.9 TRAC Program for Problem 2. 
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The purpose of this problem is to list all the words in 
a line of text that begin with a vowel. For simplicity 
there is no punctuation. 

Words have to be isolated. In SN0B0L4 the BREAK 
function, in conjunction with a conditional variable WORD, 
does this. The ANY function of SNOBOL4 is convenient for 
matching any of the vowels with the first character of WORD. 
In PL/I each vowel ai.st be compared individually. Again 
SN0BOL4 f s pattern matching superiority is apparent. The 
SN0BOL4 and PL/I programs dispose of a word in TEXT after it 
is assigned to variable WORD. 

A different approach is taken in APL since TEXT is an 
array* The index of each blank character is placed in 
vector VEC. Each word is isolated and checked* 

The TRAC program is organized as a series of calls to 
NEWWORD, COHPAR, PRINT, and TEST. The cursor of VOWEL must 
be reset before comparisons with each word* W is the 
current word under consideration* LET is the first letter 
in the current word. 



3.3 PROBLEM 3 

An interesting problem that illustrates many of the 
operations needed in string handling is the following. 

Consider a student sitting at a terminal who is 
answering guestions in a foreign language drill. The 
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interactive system types a question that the student is to 

answer* If the student types the correct answer, the system 

responds with an R and types the next question. If the 

student missed the answer, he must try another reply. It 

would be helpful for the student to receive feedback that 

some of his answer was correct* For example, consider this 

hypothetical drill in English. The student 9 s answers are 

preceded by a question mark. 

What is the capital of France? 
?Marseilles 

_a r . 

?Paras 
Par_s 
?Paris 
R 

what are the three fi*s? 
?reeding, riting, awritbmetick 
reading, _riting, arithmetic 
?reading # writing, arithmetic 

R 

The procedure for comparing the student's answer with 
the correct answer is as follows. If the two answers are of 
equal length, they are compared, and R is returned if they 
are the same. If the two answers are not of equal length or 
are of equal length and not the same, the student answer is 
searched from left to right for n-character length sequences 
of the correct answer. 

Assume that the the value of n is first 7, then 2. In 
the second drill question 'reading 4 would be the first 
sequence the student ansver is searched for, 'eading,* is 
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the second seven character length sequence, 'ading, • the 
third, etc. No match occurs until 'citing, 9 . 

When a match occurs, the letters following the matched 
seguence in the correct answer (II) and student answer (S) 
are searched one by one until the letter in 1 and the 
corresponding letter in S are not the same. For example, 
after "riting, • is found in S the characters • • and # a § 
will also be matched. In programming the problem, filler 
characters, the asterisk and the slash, are substituted for 
the matr-hed characters in H and S, respectively. For 
instance, in the previous example, after 9 riting, a* is 
matched, H and S would be: 
M 

reading, ******** **rithmetic 

S 

reeding, /////////writhmetick 
In future match attempts substrings with / f s and *'s are 
ignored. The seguence •rithmet* would match a substring in 
N successfully, and the subsequent 9 m c* would also match. 
Thus, after all 7-length seguencos are tried, W and S would 
be: 
M 

reading, ******************* 

S 

reeding, //////// */////////* 

O eg 



Next, H is searched for all possible 2-character length 
sequences in S that natch N substrings, "re 9 matches, but 
no additional characters do, so 
H 

**ading, ******************* 

s 

//eding, //////// w/////////k 

The process continues until all possible substrings have 
been tried* 

The n string is converted to an answer for the student. 
Every asterisk now in M will print as the character it 
stands for. For exanple, the letter f r* will be substituted 
for the first letter in M ia the answer, and *e* for the 
second, Any character, other than a blank, will be replaced 
in the answer by the underline character (_) . Blanks are 
given in the returned answer. In addition to an answer with 
blanks and underlines, the student receives a percentage of 
the letters in his answer that appear in the correct answer. 

The flowchart for the program (Figure 3,10) follows. 

SN0B0L4 has many string manipulating functions that 
were useful in writing the program. The SDBSTE and INDEX 
functions of PL/I were sufficient to do the necessary string 
processing in that language. However, the prograa was not 
as easy to do in APL/360, Even though A PL provides indexing 
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^ START ^ 



READ 
VALUES 
FOR N 




J+0 

MCOUNT+NUMBER OF 
CHARS. IN M 

SCOUNT+NVMBER OF 
CHARS. IN S 



© 



9 

ERIC 



Figure 3.10 Flowchart for Problem 3. 
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k- 

KS+-1 




Figure 3.10 (cont.) 




FILL IN FOR 
MATCHED M CHARS. 

FILL IN /'5 FOR 
MATCHED S CHARS. 



1+0 



5 




SUBSTITUTE 

* AND I 
I+-I+1 



Figure 3.10 (cont.) 



9 
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EXAMINE NEXT 
CHAR. IN M 







PREPARER 



CATENATE 
UNDERLINE 
CHAR. TO 
RESULT 




GET NEXT CHAR. FROM Ml 

SUBSTITUTE IT FOR * 
CATENATE CHAR. TO RESULT 



CATENATE BLANK 
TO RESULT 



REMOVE FIRST 
CHAR. FROM 
M 




Figure 3.10 (cont.) 




NO 



— ± 

INSERT PERIOD AS 
LAST CHAR. OF 
RESULT 

JJ-hJiMCOUNT 



PRINT 
RESULT 
AND JJ 



MAINLOOP 



> 



Figure 3.10 (cont.) 
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(the iota operator) , it lacks an equivalent of the PL/I 
INDEX function. 

Some of the variable names used are the sane in all the 
programs. H is the correct answer; fl1 is a copy of H. S is 
the student's answer. tt and S change as Batches are found. 
J counts the number of characters that match; JJ is the 
fraction of characters (J/ (size of M) ) that matched. N 
indicates how many characters are to be matched at once. N 
must be less than or equal to the minimum of the sizes of H 
and S. To be useful, however, the values of N should be 
small. L indicates which value of N is currently being 
used. RE5DLT is the string that is returned to the student. - 

As matches occur, asterisks replace the matched charac- 
ters in H, and slashes replace the matched characters in S. 

In the PL/I and APL programs, KS is equal to the 
position of the first character in the S-substring that is 
about to be checked. However, in the SN0B0L4 program, KS is 
equal to the current value of the cursor, the index of the 
character in S before the one about to be checked. 

3.3 .1 SHOBOm 

(Refer to Figure 3.11.) 

The patterns MPADPAT and SPAD PAT match M<L> characters 
in the patterns STARS and SLASHES, respectively. HPAD and 
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* N= NO. OF CHARACTERS TO BE BATCHED 

* ,1 COUNTS NUMBER OF CHARACTERS THAT WATCHED 

N = ARRAY (2) 
MOREN N<1> = TRIM (INPUT) 
H<2> = TRIM (INPUT) 

* 

* PATTERNS TO BE USED IN PROGRAM 

MPADPAT = LEN ( * N < L > ) . MPAD 
SPADPAT = LEN(*N<L>) . SPAD 
STARS = •***+***» 
SLASHES = •///////' 

S2 = LEN (*KS) *T AB (N <L> + KS) . S3 
S4 = *LEN(KK + I) *TAB(KK +1+1) , 



S5 



* 
* 

MAINLOOP M = TRIM (INPUT) 
OUTPUT = 
OUTPUT - 
OUTPUT = M 
S = TRIM (INPUT) 
OUTPUT = S 
J = 0 

* COUNT NO. OF CHARACTERS 
MCOUNT = SIZE(M) 
SCOUNT = SIZE(S) 
EQ (MCOUNT, SCOUNT) 



F (THRU) 



:F (THRU) 



IN 



AND S 



* 

* 



IS M-SUBSTRING EtfUAL TO S-SUBSTRING? 
I DENT (M,S) 
OUTPUT = ' R 1 



F (SET) 



: F (SET) 

: (MAINLOOP) 



* INITIALIZE VARIABLES 
SET L = 1 

RESULT = ' • 

* NFED A COPY OF M 

M 1 = M 

* KS POINTS TO CHARACTER BEFORE ONE TO BE MATCHED 
SRESET KS = 0 

+ SET MPADPAT TO A PATTERN OF N<L> STARS AND 

* SPADPAT TO A PATTERN OF N<L> SLASHES 

STARS MPADPAT 
SLASHES SPADPAT 



Figure 3.11 SNOBOL4 Program for Problem 3. 



SLCCP GT(KS + N<L>,SCOUNT) :S (NEWN) 

* ISOLATE NEXT N<L> CHARACTERS IN S 

S S2 

* ^NY SLASHES IN S-SUBSTRING? 

S3 ANY(V') :S(KSINC) 
CHECK FOR A WATCH; IF SUCCESSFUL, FILL IN * * S FOR HATCHED 

CHARACTERS IN M AHD /• S FOR MATCHED CHARACTERS IN S 
K POINTS TO THE LAST MATCHED CHARACTER IN M 
KK POINTS TO THE LAST HATCHED CHARACTER IN S 

H S3 oDK = MPAD :F(KSINC) 
S S3 cDKK = SPAD 
CHECK FOR ADDITIONAL CHARACTERS THAT HATCH; 
FILL IN ••S AHD /«S 
I = 0 
AGAIN S SU 

TAB {K f I) . HEAD S5 = HEAD :F(CALC) 



* 



S 
H 
S 
I 



I) 



* AT 



* YES, 
CALC 
* 

NEWN 



TA B {KK + 
= 1+1 
LEAST ONE MORE CHAR. 
GT (K + I,HCOUNT) 
GT (KK + I,SCOUNT) 
AT LEAST ONE MORE CHAR. 
K5 = KK + I 



TAIL S5 = TAIL 



IN M AND S? 



L = L + 
Eg (L,3) 



1 



:S (CALC) 
:F (AGAIN) 

: (sloop) 



:F (SRESET) 



PREPARER H LEN ( 1 ) . TEMP = 

IBENT (TEMP, • *•) 
ZA IDEN5? (TEMP, 1 ') 

RESULT = RESULT • • 
ZB RESULT = RESULT 

ZC J = J + 1 

M1 L E N ( 1 ) . ANSWER 

RESULT = RESULT ANSWER 
ZD Ml LES(1) = 

* 
* 

* REPLACE LAST CHARACTER WITH A PERIOD 

PREOUT . RESULT RTAB ( 1) . TEMPI LEN ( 1 ) * TEMP 1 



:F (PREOUT) 
:S (ZC) 
:F(ZB) 

: (ZD) 
: (ZD) 



: (PREPARER) 



1 1 



Figure 3.11 (cont.) 



OUT OUTPUT = 

OUTPUT = RESULT 
* CONVERT TO REAL NUMBERS 

AJ = CONVERT (J, 'REAL') 

AMCOUNT = CONVERT (MCOUNT, ♦ REAL') 

AJJ = A.? / AMCOUNT 

OUTPUT = AJJ 

* 

KSINC KS = KS + 1 
* 

THRU 
END 



: (MAINLOOP) 
: (SLOOP) 



DAS HAUS 1ST HI CRT GROSS. 
DAS VATERHAUS IS VBRNICHTET, 



DAS HAUS IS 
0.6399999 



NICHT 



■(-CORRECT ANSWER 
-(-STUDENT'S ANSWER 

■(-COMPUTER RESPONSE 
^PERCENTAGE OF CORRECT LETTERS 



MA SOEUR EST MAHIEE. 
MA SIR ET MARREE. 

MA S _R E_T MAR EE. 
0.7500000 



CETTE LECON EST DIFFICILE. 
CET LECON EST DIFISEAL. 



CET LECON EST DIP 
0.6538't61 



LA JEUNE FILLE EST JOLIE. 
LA JEUNE FILLE EST JOLIE. 
R 



Figure 3.11 (cont.) 
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SPAD are strings equal to the N<L> characters in the 
patterns HPADPAT and SPADPAT, respectively. The pattern 
STABS is used to replace matched characters in H. Similar* 
ly, SLASHES is used to replace matched characters in S. 

S2 matches N<L> characters in S, beginning with the 
(KS+1)st character; S3 is a string equal to those N<L> 
characters* 

After a natch of N<L> characters in H has occurred, KK 
is set to one less than the position of the next character 
in S. Similarly, K is set to one less than the position of 
the next character in H. I indicates the number of the 
character past KK that is being checked for a match. 

S4 is a pattern which matches the (KKfI+1)st character 
with a character in S. S5 is the string containing that 
character* If S5 is the (K+I+1>st character, a star and 
slash are substituted in H and S, respectively. 

3.3 .2PL/I 




(Refer to Pigure 3.12.) 

A A is the H (L) -length substring of S that starts in 
position KS. H L&> searched for an occurrence of A A* If 
there is a match, then A is set to the index of the natch. 

3 .3.3 APL 

(Refer to Figure 3. 13.) 
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PROBLEMS PROCEDURE OPTIONS (HA I N) ; . 

DCL RESULT CHARACTER (80) VARYING, 
(M,M1,S) CHARACTER (80), 

(MCOUNT,SCOUNT, N (2 ) , KS , L , B , A , A A A) FIXED, 
A A CHAR (80) VARYING, 
JJ FIXED DECIMAL (6,5) ; 

ON ENDFTLE (SYSIN) CO TO THRU; 
/* H IS NUMBER OF CHARACTERS TO BE MATCHED */ 

GET LIST ( (N (L) DO L=1 TO 2)); 
/* J IS THE FRACTION OF MATCHED CHARACTERS PER STRING */ 
/* READ CHARACTERS INTO CHAR. STRING VAR.'S M AND S . */ 
MAINLOOP: GET EDIT (M) (SKIP, A (80) ) ; 

GET EDIT (S) (A (80) ) ; 
/* PRINT THE STRINGS */ 

PUT SKIP(3) EDIT (M) (A(80)); 

PUT EDIT (S) (SKIP, A (80) ) ; 
/* INITIALIZE NO. OF MATCHED CHARACTERS */ 

J=0; 

/* COUNT NO. OF CHARACTERS 111 EACH ARRAY */ 
MCOUNT = INDEX (M , • . • ) ; 
SCOUNT = INDEX (S, • . • ) ; 
IF MCOUNT=SCOUNT 
THEN IF M=S THEN DO; 
PUT SKIP LIST (• R«) ; 
GO TO MATNLOOP; 

END; 

/* COPY OF M */ 
M1 = M; 

/* INITIALIZE RESULT */ 
RESULT = ■'» • ; 
NEWN: DO L=1 TO 2; 
/* KS IS EQUAL TO THE POSITION OF THE FIRST CHARACTER IN */ 
/* THE SUBSTRING OF S THAT IS BEING CHECKED */ 
KS = 1 ; 

SLOOP; IF KS+N (L) >SCOUNT+1 THEN GO TO NEWN END; 
/* ANY SLASHES IN S-SUB5TRING? */ 

/* AA IS THE N(L) -LENGTH SUBSTRING OF S, BEGINNING */ 
/* WITH THE CHARACTER IN POSITION KS */ 

AA = SUBSTR (S , KS , N (L) ) ; 

A A As INDEX (AA,V* ) ; 
/* IF A SLASH, GO TO NEWKS */ 

IF AAA-<-0 THEN GO TO NEWKS; 
/* IS S-SUBSTRING IN M? */ 

/* IF SO, A IS THE INDEX OF THE FIRST OCCURRENCE OF AA ♦/ 
A=INDEX (M,AA) ; 



Figure 3.12 PL/I Program for Problem 3. 



IP A-=0 
THEN DO; 
/* YES, S-SUBSTKING IS IN M */ 
DO B = 0 TO N (L) -1 ; 

SUBSTR (M,A + B,1) =•*• ; 
SUBSTR (S,KSf B, 1) =• /• ; 

END; 

/* DO ANY ADDITIONAL CHARACTERS MATCH? */ 
DO 1-0 BY 1 

WHILE (A+N (L)+K=MCOUNT S KS+N ( L) +K=SCOUNT ) ; 
IF SUBSTR (M,A + N (L)+I, 1) - 

SUBSTR (S,KS+N (L) +1, 1) 
THEN DO; 

SUBSTR ( M, A + N(L) +1,1 )=•*•; 
SUBSTR (S, KS + N (L\ + I„ 1) = •/' ; 

END ; 

ELSE DO; 

KS=KS+N (L) +1; 
GO TO SLOOP; 

END ; 

END; 

END; 

/* NO, S-SUBSTRIKG IS NOT IN M */ 

ELSE NEWKS: KS = KS+1; 

GO TO SLOOP; 
NEHNEND: END N E VI N ; . 

/* PRINT PARTIALLY MATCHED STRING */ 
PREPARER: DO 1=1 TO MCOUNT; 

IF SUBSTR(M,I,1) 3 *** 

THEN DO; 

RESULT = RESULT || SUBSTR (M 1,1, 1) ; 
J = J+1 ; 

END; 

ELSE IF SUBSTR(M1,I,1) • • 

THEN RESULT = RESULT I I 
ELSE RESULT = RESULT II . • '; 
END PREPARER; 
/* HAXE SURE LAST CHARACTER IS A PERIOD */ 
SUBSTR (RESULT, MCOUNT, 1) =• . • ; 
PUT EDIT (RESULT) (SKIP(2), A (80)); 
JJ=J/MCO(INT ; 
PUT SKIP LIST (JJ) ; 
OUT: GO TO MAINLOOP; 
THRU: END PROBLEM; 



* Figure 3.12 (cont.) 
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DAS HAUS 1ST NICHT GROSS. 
DAS VATERHAUS IS VERNICHTET. 

DAS HAUS TS NICHT _ . 
0. 6 1'399 



MA SOEUR EST MARIEE. 
MA SIR ET MAH EE. 

MAS R E T" MAR EE. 
0.75000 



CETTE LECON EST DIFFICILE. 
C FT LECON EST DIFISEAL. 

C ET LECON EST DIF - 

0.65383 



LA JEUNE FILLE EST JOLIE. 
LA jEfJIlE FILLE EST JOLIE. 

R 



Figure 3.12 (cont.) 



9 STRINGS 

MAINLOOP-.M*® 

Ml+M 

L*-l 

J+-0 

+((pM)*pS)/NEWN 
ITER M MLlltSZn ) /NEWN 

+(.(.i+i+i)*pM)/iter 
[>•/?• 

■+MAINLO0P 
NEWN-.KS+-1 
RESULT-*-' ■ 

SLOOP'M (,KS+NtL])>l+pS)/NEVNEND 
D+S SUBSTR KS % NiL2 
■+(.(. ni'/')<l+pD)/NEWKS 

HIGH+O 

TEST1 : ( HIGH-*-HIGH+RIGHl+[ /TEMP*MLHIGRi\pM'] \ D) =l+pM) /NEWKS 

SVB+{ TEMPx HI GUI ) - 1 

-K ( ( HIGH-SUB)+NIL1-1) >pM)/NEWKS 
TEST-MMt(HlGH-SUB)+( ( i NIL] ) -1 ) 3*Z>) /TEST! 

A+-HIGH-SUB 

B-«-0 

-*•( (^S+l)<ff[L])/>lZO0P 

BLOOP-M ~( (i4+/VCZi]+-r)SpW) A( (/f5+/ltL]+I)5p5) ) /OTHER 
E+A+NLL1+I 
F+KS+NZLl+I 

■*{{M SVBSTR E t l)*iS SUBSTR F t D) /OTHER 
M[i4+tf[ £]+!]■♦-' *• 

+BLOOP 
OTHER : KS+KS+Nl I] +J 
-+.9LO0P 

■+SLOOP 

NEWNFND : -+( ( L«-L+l ) *3 ) /JPEIW 
PREPARER :!«-! 



Figure 3.13 APL Program for Problem 3. 



[43] CL00P:MMLI'1* , * , )/ZA 
[44] RESULT+RESULT,M\in 
[4 5] J+J+l 
[46] 

[47] Z/1:-^(W1[J]=' ')/Z/? 
[48] RESULT-RESULT, • _ • 
[49] -vJATC 

[50] Z#: RESULT-RESULT, ' 1 
[51] J^:-^((J^I+l)SpW)/CJW0P 
[52] FINAL :RESULTlpM]*-' . ' 
[53] ^RESULT 
[54] DWV»pW 
[55] OUT:-*MAINLOOP 
V 



STRINGS 

0: 

7 2 



ZMS Mi/5 IS!T /VJCWT GTOSS. 
ZMS VATRRFAUS IS VERNICHTET. 

DAS HAUS IS_ NICHT . 

0.64 



SOKW EOT MARIEE. 
MA SIR ET MARREE. 

MA S £M? MARJEE. 

0.75 



CmE LECON EST DIFFICILE. 
CSV LECON EST DIFISEAL. 
CET_ LECON EST DI E 
0.6538461538 



£4 JffiMJ? FiXtff EOT «70£27?. 
L/l JEUNE FILLE EST JOLIE, 
R 



Figure 3.13 (cont.) 



H is searched for an occurrence of the S-substring D. 
HIGH 1 is set to the jnaxinura of the indices of the occur- 
rences in H of the letters contained in D. If HIGH, the sun 
of HIGH 1 and the previous value of HIGH, is egual to 1 + the 
size of M, then one or acre of the letters in D does not 
occur in K and the program branches to NEWKS* SUB is 
assigned one less than the position of HIGH1 in TEMP. The 
substring in H of length N (L) beginning with the character 
in the (HIGH-SOB) th position is conpared with D. The 
program branches to TBST1 if the substrings do not match* 



76 



a COMPARISONS AND 



DISCUSSION . 



The first pact of this chapter briefly mentions some of 
the different features in each language: data forraats, 
statement formats, storage allocation, input/output, and 
subroutine capability. Next follows a discussion of string 
operations. Some string operations that are primitive in 
one language, but not in others, are coded in the other 
languages. 



4, 1 DATA FORMATS 
4.1.1 SM0B0L4 

The data of SKOB0L4 include both character strings and 
numbers, although operations on numbers are not an important 
part of the language. Conversion is done automatically 
between numbers and strings. For example, 'ABC 1 3 is 
equivalent to 1 ABC3 1 and f 12345* + 1 is^egui valent to 12346. 
Patterns are built from strings by using alternation and 
concatenation. None of the other three languages has a 
pattern data type. 
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In the TRAC language both instructions and data are 
strings. If arithmetic primitives are called, the parame- 
ters will be treated as numbers. Bach instruction string is 
evaluated and replaced with a value string, which may itself 
be evaluated in turn, 

4. 1.3 APL 

The data of APL are characters and numbers. A charac- 
ter vector, however, is a vector of single characters and 
not a string of characters, as is the case in SN0B0L4, PL/I, 
and TR AC. Arrays may be formed using characters or numbers. 
Conversion between numbers and characters is not done, and 
it is not permissible to mix the two data types. 

4. 1. 4 PL/I 

Data in PL/I consist mainly of fixed and floating point 
numbers and character and bit strings. Arrays and struc- 
tures can be made from the data. Each identifier or 
variable is considered to have attributes which usually are 
specified in DECLARE statements. Strings and their maximum 
lengths are not declared in the other three languages, but 
this must be done explicitly in PL/I. Conversion is done 
au tomatical ly between numbers and strings. 
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4.2 STATRMEHT FORMATS 
Sa,2iJ HOBQL4 

All statements in SNOBOL4 are of the form 
label subject p attern = object q 07 to 
In various uses some of the five parts are omitted. This 
statement format permits pattern matching to be specified 
easilj. 

4.2 ,2 TRAC 

All statements in TRAC are written as 
: (PCN r p[ 1],p[ 2 ],..., p£k]) 

where FCH is a two-letter TRAC primitive and p[ 1 ], p[ 2 ], . . . , 
p[ k ] are arguments. 

4. 2.3 API 

There are two types of statements in A PL, branch and 
specification. Specification statements are similar to 
assignment statements. Branch statements are used chiefly 
in function definitions. 

4. 2. 4 PL/jr 

Unlike the other languages, there are many different 
statement types in PL/I. These include the DECLARE state- 
ment, assignment statement* DO statement, IF statement, 
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input/output statements, and others. 

a. 3 STORAGE ALLOCATION 
4. 3. 1 SM0B0L4 

Storage allocation is done dynamically. When storage 
space is filled, the storage is regenerated. That is, all 
needed data are collected, and all data inaccessible to the 
SN0B0L4 program are deleted. The user is unaware of this 
process. Such programing techniques as building patterns 
in a loop use a lot of storage and should be avoided to 
prevent freguent storage regenerations. The user does not 
reserve space explicitly for variables, except for arrays* 

4.3.2.TRAC 

Storage is divided into several areas by the TR AC 
interpreter. User operations specified by the define string 
primitive are kept in a form store. The active string stack 
and the neutral string stack contain only the parts of the 
current instruction that is being evaluated. 

3.3 API 

Storage reservation is done implicitly by the A PL 
system. That is, the user does not have to declare any 
variables explicitly. Storage in TRAC and SNOB0L4 is also 
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implicit* When using the APL system, the user has his own 
working storage, called a workspace. An active workspace 
has cooi for internal system needs, storage, and transient 
information. When inactive, a workspace is put in a library 
on secondary storage. 

4. 3 .4 PL/I 

Storage space for variables is allocated within begin 
and procedure blocks. Usually the DECLARE statement is used 
to reserve the space. Onlike the other languages, the 
maximum size of a character string must be specified in the 
DECLARE statement. 

4. 4 INPUT/OUTPUT 
4.4.1 SN0BOL4 

Input/output is done by "association". The variable 
INPUT is usual associated with the card reader, and the 
variable OUTPUT is usually associated with the printer. For 
example, 

TEXT = INPUT 

assigns to TEXT the data on the next input card. 
OUTPUT = LINE 

assigns to OUTPUT the information in variable LINE; numbers 
are automatically converted to string characters for 

° A1 
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printing. 

Input/output operations are not given special treat- 
ment; TBAC primitives handle these operations. BS reads a 
string from the input device and PS prints a given string. 

4„ 4.3 APL 

Whenever an expression or variable is typed by itself, 
the APL system responds by printing the value of that 
expression or variable. 

within function definitions, if a quad character □ is 
written to the right of the specification arrow, the system 
types 

□: 

and waits for the user to type an expression. Also within 
function definitions, if a quad character with a quote mark 
inside it 0 is written to the right of the specification 
arrow, the system stops and waits for character input to be 
typed. 

4. 4.4 PL/I 

Input/output in PL/I may be stream-oriented. Data are 
regarded as one continuous stream of information, not 
constrained to physical record sizes. GET and PUT are 
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associated with streai input/output. 

However, the user does have the choice of using record 
input-output. Data are organized into logical records which 
are treated as a whola. 

4.5 SUBROUTINE CAPABILITY 
4. 5. 1 SNOB0L4 

The user »ay define functions by using the DEFINE 
function * After a function has been defined, it nay be 
invoked the sane way as built-in SN0B0L4 functions. 

1.5 .2 TB AC 

Hew operations can be defined using DS primitives. 
Since there is no iteration in TRAC, recursion aust be used 
frequently in these operation definitions. 

4.5.3 &PL 

Defined functions give subroutine capability. If they 
have arguments and return a value, they may be defined as 
either a binary or a unary operator. 

4._§. 4_PL^/I 

PL/I permits both internal and external subroutines 
(procedures). Some procedures may be called as functions 
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and return a value. 




a. 6 BASIC STRING OPEBATIONS 

From the many accounts that I have read ([3] and [19]), 

I regard the following as the most basic of all string 

operations; 

concatenation of two strings 
insertion of a substring 
deletion of a substring 
pattern Batching or PL/i INDEX 

Another operation, pattern matching with replacement, is 

often regarded as primitive (for example, in SNOB0L4). 

However, it is a combination of all the above. Pattern 

matching vith replacement involves finding the occurrence of 

a substring in a string (pattern matching) , replacing it 

with either a nonnull substring (insertion), or the null 

string (deletion) 9 and then putting the string together 

again (concatenation) • 

4»b»1 Concate n atio n 

Concatenation of two strings is the most basic of all 
string operations. Concatenation is done in SN0B0L4 by 
implication* For instance, consider the SN0B0L statement 

sub i pat1 pat 2 = pb j l obj2 
In this example the pattern used is the concatenation of 
jjatl. and pat 2* Similarly the object is the concatenation of 
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obJJ and obj2. 

In kPL and PL/I, on the other hand, an explicit 
operator for concatenation is used. In APL this operator is 
the coaaa, and a restriction is inposed that characters and 
nunbers cannot be concatenated. The symbol ] | joins two 
strings to be concatenated in PL/I, and unlike APL, automat- 
ic conversion to characters is done if a number is found. 

TBAC, like SN0B0L4, does concatenation implicitly. The 
results of evaluating two macro calls written next to each 
other are concatenated. Frequently one or L~»th of these 
calls returns a null value, even though side effects occur. 

4.6. 2 In sertion of a s ubstring 

Suppose it is desired to insert the vord 'THE* after 
the tenth character of string STR. This could be done in 
SN0B0L4 as follows: 

STB LE»(10) . VAB1 * VAH1 •THE 1 
The above statement replaces the first ten characters of 
STR, assigned to conditional variable VAR1, with VAB1 
concatenated with the word ■THE*. 

The same operation could be done ioi PL/I with the 
statement 

STR - SUBSTR(STR, 1, 10) H •THE 1 II SUBSTII (STR, 11); 
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The following A PL statement will do the insertion: 

SZ7M 10+S27?) , 'THE\1Q+STR 
(See section 2.3*5 foe an example of + and 4,) 

The following TRAC definition for STB will do the same 
operation as the above three: 

: (DS, STR, : (CN,STR, 10) THE: (CS,STR)) 

*J«6»3 Del etion of a sufr str ing 

Consider the operation of deleting the eleventh through 
thirteenth e&e.e&ccets of STR* The following SN0B0L4 state- 
ment tfixl do this: 

STB TAB (10) LEN (3) = 
The following PL/I statement will do the deletion: 

STR * SUB STR (STR, 1,10) || S0BSTR(STR, 14) ; 
In APL the operation could be done as follows: 

STR+STRl\10l.l3+STR 
This operation is rather complicated when written in TRAC* 
Consider 

; (DS ,STR, : (CN ,STR, 10) : (EQ,: (CN,STR,3) ,) : (CS, STR) ) 
The hard part is to move STR's form pointer ahead to the 
fourteenth character from the tenth without getting the 
characters in between* The above use of EQ does this* 
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Pa ttern p ate hin q 



SHOBOLU is really the only language of the four in 
which it is easy to do complicated pattern Batching tasks. 
Consider the following task: replace the first occurrence 
of PAT 1 in the string STB with •THE*, or if PAT 1 is not 
present, replace the first occurrence of PAT2, or if PAT2 is 
not present, replace the first occurrence of PAT3. 

In SBOBOLU only one statement is needed to do this: 

STB PAT1 | PAT2 | PAT3 = 'THE* 

This operation requires more statements when done in PL/I: 

A=IHDEX (STB,PAT1) ; 
B=INDEX(STB,PAT2) ; 
C=IHDEX(STB,PAT3J ; 
IF A=0 THEN 

IF B=0 6 C->=0 T HEN 

STB=SUBSTH(STB, 1,C- 1) || •THE* || 
SOBSTB(STR,C+LENGTH (PAT 3) } ; 
ELSE IF B-=0 THEN 

STB=SUBSTB(STR, 1 , B- 1) || * THE • || 
SUBSTB ( STB, Cf LENGTH (PAT2) ) ; 

ELSE; 

ELSE STB=SUBSTB (STB, 1 , A-1) || 'THE* J| 
SUBSTR (STB, A-J-LENGTH(PATI) ) ; 



APL and TBAC code for this same problem would be extremely 
long* The same problems in coding are shown in the 
following examples of pattern matching with replacement. 



4. p. 5 Patter n matching wit h replaceme nt 



Consider a typical pattern matching problem such as 
finding whether the word 'THE* is present in a sentence and 
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if so, deleting or replacing its first occurrence in the 
sentence. The SN0B0L4 language is dedicated to doing just 
this kind of problem. 

The PL/I index function, INDE X (string # substring) , finds 
whether an occurrence of substring is present in string. 
IN DEX returns the index of the first character of the 
matched portion of the string. If there is no match, a 
value of 0 is returned. There is no way in PL/I to indicate 
without an index value the success or failure of a pattern 
uatch. 

With its present string primitives, PL/I cannot answer 

the guestion "Is a 'THE 1 present?" without also finding the 

position in the sentence of the first •THE 1 , because the 

INDEX function is the only way to deternine whether a 

substring is present in a string. For example, INDEX (SENT, 

•THE 1 ). The pattern Batching with replacement operation in 

PL/I must know the index and would be done as follows: 

SENT = SUBSTR (SENT, 1, INDEX (SENT, • THE' j] 
replacement 1 | 

SUBSTR (SENT, INDEX (SEET, •THE 1 ) +3) ; 

SN0B0LU uses the cursor function a to give the position 
of the match. For example, 

SENT SPOSN •THE i 
POSN returns the index of the first * THE 1 in SUNT. 

TR&C takes a different approach to the problem. Like 
SN0B0L4 and unlike PL/I , it may find whether a substring is 
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present in a string without finding the in^9X of this 
occurrence. This is done with the Yes There (YT) primitive. 
The saie problem aay also be solved using the IN primitive. 
In that case everything in the string up to the substring to 
be matched is returned as value. in either approach it is 
unnecessary to know the index of the natch. 

Pattern Batching with replacement is usually done with 
the following seguence in TR AC : a define string primitive 
(DS) defines the string? the segment string (SS) lists the 
substring (s) to be replaced, and the call (CL) primitive 
calls the string with the indicated replacements. 

The sequence, involving macro (string) definition and 
parameter calls, is inherent in the design of TR AC. If no 
replacement for a parameter is given in the CL operation, 
the null string is substituted for that parameter, thus 
deleting it. However, this sequence is different from the 
original problem because all occurrences, not just the 
Exist, are changed . 

To replace just the first occurrence, other primitives 
must be used. For instance, consider the followinq. The 
initial (IN) function finds the first occurrence of THE. 
The resulting value is the portion of SENT preceding 'THE 1 . 
The form pointer now points to the first character after 
•THJ i . A : (CN,SENT,-3) instruction resets the form pointer 
to the T of 1 THE ■ . a left pointer primitive K LP) finds the 
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number of characters to the left of the pointer and assigns 
these characters to a variable LEFT with a DS primitive. An 
instruction with : (EQ, :: (CN, SENT, 3, J ) would move the form 
pointer to the first letter after • THE • and would give a 
null result, Then the value of a call segment {CS) of SENT 
would give the remainder of SENT, 

Unfortunately APL does not have any readily available 
functions to solve pattern matching problems. This defi- 
ciency is the reason string problems are so difficult to 
code in APL. The deficiency is present because APL, which 
regards strings as arrays, operates uniformly on these 
strings. Thus string operations are done character by 
character; every character is treated the same. For 
example, using the index function, 

'HE WAS THE RIGHT ONE* \ *THE* 
examines the string 1 HE PAS THE RIGHT OHE* to find first, 
the character T, then the characte r H, and finally character 
E. The result is the vector 7 12. To allow scanning for 
the string •THE 1 , a fairly involved defined function must be 
used, as in Chapter 3. 

The replacement problem in a PL is not difficult once 
the substring is found. Suppose variable IND is assigned 
the index of the first 1 THE 1 in string SENT and variable 
WORD is to be inserted in place of •THE' . Then an 
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instruction 

SENT+( ( im-l) +SENT ) t WORD , ( IND - 2 ) \SENT 
would do the necessary replacement, assuming SENT has at 
least one character. 

7 OTHEB STRING OPERATIONS 

It is essential in doing string problems to be able to 
find the size of a string easily. For instance, consider 
scanning a string for the occurrence of several copies of a 
substring* It would be desirable to know the length of the 
substring so that when an occurrence is found, the length 
could be used in maintaining a cursor for the start of the 
next scan. SNOB0L4 has the SIZE function, PL/I the LENGTH 
function, and APL the size function to do this operation. 
Finding the length of a string is slightly harder in TRAC. 
The form pointer must be set to the beginning of the string 
by : (CR,string) , and then :{RP, string) will return the 
number of characters to the right of the form pointer, i.e. 
the length of the string. 

Another operation that should be readily available is 
comparing two strings. Usually two functions are available 
for this purpose - either to compare the strings for their 
sa meness or to compare them for their difference, namely 
IDENT and DIFFER in SN0B0L4, = and in PL/I, and eguals 
and not eguals in APL. TBAC is different. : (EQ,X1,X2,t,f ) 
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tests X1 and X2 for equality and branches to t or f 
accordingly. IDE NT and DIFFER are said to return values of 
success or failure and then a separate instruction in the 
go-to field indicates the branch. 

Two strings must be of the same length to be compared 
in APL. If strings X and Y are compared using the equals or 
not equals operator, a vector the size of X (or Y) will be 
returned. This vector indicates whether the characters in 
the respective positions of X and Y matched. In SBOB0L4 and 
PL/I, if the strings are not of equal length, the shorter of 
the two is padded with blanks. 

k lexical ordering operator is also quite useful. 
SN030L4 has LGT(X1,X2) to test whether X1 precedes X2 in the 
collating sequence of the machine being used- Ail the 
relational operators of PL/I nay be used to compare two 
strings for lexical order with respect to the mackine^ 
collating sequence. : (LG, X1 , X2, t, f ) in TR AC tests lexical 
or dering depend in g on collating sequence and, as above, 
branches accordingly. APL does not consider a machined 
collating sequence and thus can have no lexical ordering 
operator. However, a user may define his own collating 
sequence. For example, 

ABCDEFGHIJKLMNOPQRSTUVWXYZ' 

L1*-*C % 
L2«- f £' 

How any relational operator may be used in place of the > in 




the following: 

(ALPH\L1)>(A'LPH\L2) 
The index of the occurrence of L1 and L2 in ALPH serves the 
purpose of a lexical operator, but again only for single 
characters, not strings. (The decode operator can be used - 
see Chapter 3.) 

The SUBSTR operator of PL/I turns out to be useful in 
the other three languages as well. To review, 

X1 - SDBSTR (X,I1,I2) 
assigns to XI the 12 characters of X beginning with the 11 
character. In SN0BOL4 one might use 

X TAB (11 - 1) LEN (12) • X1 
Similarly in APL 

n«*n~i+ii+ii2] 

More thought is necessary to do the operation in TRAC. The 
following would do the SUBSTR operation in TRAC. 
: (DS, SUBSTR, ( 

: (CR,<1>) 

: (EQ, : (CN,<1>,: (S0,<2>,1)) ,) 
:(CN,<1>,<3>))) 

Since it is permissible to eliminate the CL primitive, 
SUBSTR could be invoked by : 

: {SUBSTR,X,: (II) , : (12) ) 
instead of 

: (CL, SUBSTR, X , : (II) ,: (12) ) 
(However, this SUBSTR function does not allow for the case 
where argument <3> is omitted.) 




93 



The following two tasks frequently occur in lower level 
as se abler coding. One string handling task is take a 
string, define two lists of characters, and then replace the 
occurrences of the characters in the first list in the 
string by the corresponding members of the second list. 

The REPLACE function in SNOBOL4 does this. Consider 

the following: 

STRING 1 = f TH E BEAR IS GONE 4 
TAB1E1 = »B A 1 
TABLE2 = ' D ; E • 

STRING2 = REPLACE (STRING1 ,T ABLE1 ,TABLE2) 
STRING2 has value • THE; DEER ; IS ;GONE* • 

PL/I has the built-in function TRANSLATE to accomplish 
this replacement* Assuming the previous definitions for 
STRING 1 , TABLE 1 , T ABLE2, the statement 

STRING2 = TRANSLATE (STRING1,T ABLE1, TABLE2) 
assigns to STHING2 the value * THE; DEER : IS; GONE* . 

Using segment gaps, the problem say be coded in TR AC. 
Procedure COMMA calls every character in its argument one at 
a time. After execution of COMMA, every character except 
the last in the argument is delimited on both sides by a 
comma. 

: {DS, COMMA, ( 

; <GR, : (RP,<1>) ,0, 

<(,):: <CC,<1>) : : (CL, COMMA, < 1>) ) ) ,)) 
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Then 

: (DS,TABL E1 , (B ft) ) • 

: (DS, TABLE2 , (D; E) ) 1 

S (DS,STRING1,THE BEAR IS GONE) 1 

: (SS,STBING1,: (COMMA, TABLE1 ) ) • 

: (DS, STRI NG2, (: (STBIKG1 : (COMM A, TABLE2 ) ) ) ) • 

: (PS, : (STRING2) ) • 

The result •THE;DEER i IS;GONE» is printed. 

The task may be done in A PL with the following code. 



STRING1+' THE BEAR IS GONE' 
TABLE1+*B A* 
TABLE 2-*-*D;E* 
1*1 

STRING2+-STRING1 
LOOP:A<-STRING2\TABLElUl 
-*<A=l+pSTRING2)/INC 
SWING2L A 1+TABLE21 1] 
■+LOOP 

INC : -»•( ( iW+ 1 ) >p TABLE! ) /OUT 

+LOOP 
OVTi+0 



Consider the following problem in each language: find 
the index of the first nonblank character in a string. 

The SPAN function of SN0B0L4 in the statement 

STRING SPAN(« •) 
matches all blank characters in STRING up to the first 
non-blank. SPAN must match at least one character, or 
failure is indicated. A function in PL/I very similar to 
this in its effect is VEBIPY, which in 

VERIFY (STRING , 1 •) 
returns the index of the first non-blank in STRING. It 
returns zero if STRING contains only blanks. 

The difference between SPAN and VERIFY reflects a basic 
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difference in SBOBOL^s and PL/I f s approach to string 
problems • SPAN is used in a pattern matching statement ; the 
statement is said to succeed or to fail. On the other hand 
VERIFY returns zero if all characters of the first string 
are present in the second string. Otherwise the index of 
the first character in the first string which is not present 
in the second string is returned. In SN0B0L4 the cursor 
operator d may be used to find the index of success. For 
example, IND is assigned the index of the first nonblank in 
the following statement: 

STRING SPAN(* ») 81 ND 
Notice that tao operators are necessary in SN0BOL4 to 
perform the same function that one operator, VERIFY, does in 
PL/I. Thus in a sense SPAN is a more primitive operation 
than VERIFY. This shows a difference in the languages, 
namely in SN0B0L4 the index of a pattern match is separate 
from the pattern itself. 

APL does not have a single primitive for this problem. 
However, the operation may be done using the following: 

( STRING e 1 ' )x0 



ERLC 
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The operation night be done in TRAC as follows: 
: (DS,LOOP, ( 

: <EQ, : : (CC,<1>) , , 

(: <DS,I,: (AD, : (I) ,1)) 
: (GR # : (I), : (LEN) , 

(: (PS, ALL BLANKS) ) , 
(: (LOOP,<1>) ) )) ) ) • 

: <DS, 1,1) • 

: (DS, LEN, : (RP ,STR I NG) ) ■ 
: (LOOP, STRING) • 

4,8 DISCUSSION 

The languages exhibit strengths in different areas of 
string handling. Clearly, SN0B0L4 is superior for pattern 
matching problems. This is particularly evident in the 
SNOBOLU and PL/I pattern matching problem in section 4.6.1. 

The SNOBOL4 pattern data type gives great flexibility 
in creating and referencing patterns. In SNOBOLU it is 
possible to find whether a pattern match is successful or 
unsuccessful without determining an index value. If a match 
is successful, replacement takes place; otherwise, no re- 
placement occurs. In PL/I, however, an index value must be 
tested • 

If a general purpose programming language is needed for 
a string problem, then PL/I is usually a good language to 
use,. Its INDEX and SOBSTR primitives are very powerful. 
However, there are restrictions* After all, PL/I is a 
general purpose programming language and is not dedicated to 
string handling tasks. SNOBOL4, being dedicated to pattern 
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matching problens, has its main stateaent form designed with 
this in Bind* PL/I does not, so it would require a language 
extension to make this sort of problem easy in PL/I. 

Rosin [ 18 ] has proposed modifications to PL/I to 
improve string handling. First, he suggests that the 
default for the character string type be V Alt if I ng, not FIXED. 
Specification of a string's maximun length vculd be option- 
al. Second, he feels that the SUBSTR notation, when SUBSTR 
is used as a pseudo-vari able, is confusing. Instead he 
suggests something like X{A,I:J+I-1) in place of 
SUBSTR (X(A) ,1, J) ; X{I:I) for SUBSTR (X, I # 1) ; etc. If B= 
•WXYZ 1 * I=2 r J=3, then B(I:J) = l XY f . 

Other modifications, modeled somewhat after 5N0BOL4, 
would make pattern natching and replacement easier. Rosin 
defined five nov operators to be used: OPTO, BEFOHE, AFTER, 
FSOH, and IH. If X= f ABCDEFG * and Y= 1 DE* 9 then X QPTO Y is 
1 ABCDE* , X BEFORE Y is •ABC 1 , X AFTER Y is ■ FG • # X FROM Y is 
* DEFG* , Y 18 X is •DE 1 . Two or more of these operators may 
be used in the same expression. For example, X FRO 8 Y BE- 
FORE • G* is *DEFV 

Like SN0B0L4, if Y is not present in X for any of the 
operations, the scan fails. Any expression involving any of 
the five operations may be written on the left hand side of 
a statement; Rosin refers to this as a pseudo-expression. 
For etaaple, 
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Z * 'CAT 1 

•A 1 IH 2 = 'O' 

changes z to •COT' • 

There are disadvantages to these suggest ions whic ,h 
Rosin himself brings up. The words UPTO, BEFORE, AFTER, 
F80H , and IN might have to be reserved words in PL/I r 
contradicting the PL/I design of no reserved words. Fur- 
ther, pseudo-expressions make the equals sign ambiguous. 
For example, consider: 

DCL C BIT (2) , D BIT (5) ; 
D OPTO C = " 1 »B = » 1»B; 

In the above statements either of the two * signs could be a 
comparison and the other an assignment operator. 

APL does have some primitives useful in string hand- 
ling, but it is in need of some sort of PL/I-like INDEX 
function before it could be used extensively in pattern-type 
problems. In any sort of string operation in APL, one must 
not lose sight of the fact that character strings are arrays 
of characters. This feature in the APL design prevents good 
string handling, as there is no string, just an array of 
characters. This leads to problems when it is desired to 
treat a group of characters non-unif ormly« For example, it 
would bf^ nice to have the ability to find the index of the 
first occurrence of a certain word in an APL character 
vector. Unfortunately, with the iota operator , the charac- 
ter vector will be searched for the first occurrence of each 
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letter of the word individually. A vector result will be 
returned, and further oanipulat ion is necessary to get the 
correct answer. (See third example in Chapter 3.) 

Thus, for good string handling in APL, it is necessary 
to be able to treat a sequence of character array elements 
as a string. Possibly an operator could be introduced to 
produce a string from a character array. Then the result 
could be used in string operations like those of PL/I and 
SNOBOL4. It would also be desirable to be able to operate 
on sequences of differing Lengths. This would facilitate 
comparison of strings ot differing lengths. 

TRAC may indeed be useful in text editing applications 
when used interactively, but any real usefulness was not 
evident from this investigation. Any operation that needs 
to > be done more than once must be coded to be recursive 
since there is no iteration operator. Errors caused by 
mismatching parentheses -\nd choosing the wrong mode are hard 
to find. Also, TB AC makes it difficult to structure a 
program. 

I feel that TRAC is much too difficult to learn :>d 
even when learned, is still difficult to use. Unlike PL/I , 
where a programmer has to know only a small subset of the 
language to write programs, a novice TRAC programmer must be 
aware of all the TRAC nuances before he can code in the 
language. 
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Priced with the problem of choosing one of the four 
languages for a text: editing system, system- implementation 
questions aside, I would choose SN0B0L4. SN0B0L4 gives the 
ability not only to perform pattern matching easily, a 
necessity in text editing, bu'. also to perform many other 
kinds of string operations easily, PL/I , APL, and TR AC do 
not have good pattern matching facilities* These three 
languages would of course be more useful for string handling 
if additional string operators were added to the language. 
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